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ABSTRACT 


We present a novel regression framework centered on a coherent and averse mea¬ 
sure of risk, the superquantile risk (also called conditional value-at-risk), which yields 
more conservatively htted curves than classical least squares and quantile regressions. 
In contrast to other generalized regression techniques that approximate conditional 
superquantiles by various combinations of conditional quantiles, we directly and in 
perfect analog to classical regression obtain superquantile regression functions as op¬ 
timal solutions of certain error minimization problems. We show the existence and 
possible uniqueness of regression functions, discuss the stability of regression func¬ 
tions under perturbations and approximation of the underlying data, and propose an 
extension of the coefficient of determination R-squared and Cook’s distance for as¬ 
sessing the goodness of £t for both quantile and superquantile regression models. We 
present two classes of computational methods for solving the superquantile regression 
problem, compare both methods’ complexity, and illustrate the methodology in eight 
numerical examples in the areas of military applications, concerning mission employ¬ 
ment of U.S. Navy helicopter pilots and Portuguese Navy submariners, reliability 
engineering, uncertainty quantification, and financial risk management. 
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EXECUTIVE SUMMARY 


Analysts are often concerned with upper-tail realizations of random variables de¬ 
scribing loss, cost, damage of a system and attempt to approximate such loss random 
variables in terms of explanatory random variables that are more accessible in some 
sense. We develop a novel regression framework that naturally extends least squares 
and quantile regressions to contexts where an analyst seeks to assess regression errors 
not by squaring them, as in the case of least squares regression, or by looking at their 
signs, as in the case of quantile regression, but by weighing larger errors increasingly 
heavily in a way consistent with a coherent and averse risk measure, the superquantile 
risk measure (also called conditional value-at-risk). 

In contrast to other generalized regression techniques that approximate condi¬ 
tional superquantiles by various combinations of conditional quantiles, this framework 
for superquantile regression is the first attempt to use superquantiles directly in a re¬ 
gression model. The only assumption we require is that the involved random variables 
have finite second moment. We rely on the superquantile-based risk quadrangle and 
use the corresponding relations between measures of deviation, risk, and error applied 
to the superquantile as the statistic to obtain superquantile regression functions as 
optimal solutions of an error minimization problem. We develop the fundamental 
theory for superquantile regression and build an alternative problem, the deviation- 
based superquantile regression problem, which determines the regression coefficients 
by minimizing a measure of deviation as opposed to a measure of error, leading to 
computational advantages in problem size and simplification of the objective function. 
We examine existence and uniqueness of the obtained regression functions as well as 
consistency and stability of the regression functions under perturbations due to pos¬ 
sible measurement errors and from approximating empirical distributions generated 
by samples of the underlying data. We develop rate of convergence results under mild 
assumptions. 


XV 



In this dissertation, we construct a model validation technique by extending 
the concept of coefficient of determination used in least squares regression to both 
quantile and superquantile regression. We show that these coefficients of determina¬ 
tion are bounded between 0 and 1, with values near 1 preferred, and we also demon¬ 
strate that the superquantile regression problem in fact maximizes the coefficient of 
determination when it aims to minimize the error of the loss random variable by wisely 
selecting the regression coefficients. Since adding explanatory random variables pos¬ 
sibly increases the coefficient of determination, we dehne an adjusted coefficient of 
determination for quantile and superquantile regression. Another validation analy¬ 
sis tool that we develope is the concept of Cook’s distance applied to quantile and 
superquantile regression. 

We present two classes of computational methods for solving superquantile 
regression problems. The hrst computational method is denoted primal method, 
where we minimize the superquantile deviation measure using analytical integration 
or numerical integration schemes. The second computational method is based on the 
dualization of risk. We build a new superquantile regression problem by using the 
expression of risk and deviation. We compare the complexity of the methods and 
demonstrate which ones are more efficient according to the data size and show that 
dual methods are superior and only marginally slower than methods for least squares 
regression. 

Finally, we present a series of numerical examples that show some of the ap¬ 
plication of superquantile regression, such as superquantile tracking and surrogate 
estimation, that we encounter in the areas of hnancial risk management, military 
applications, reliability engineering, and uncertainty quantihcation. We compare 
computational methods by presenting their runtimes and see how the coefficient of 
determination and the adjusted one can be relevant in assessing the goodness of £t 
of the obtained regression models. 


XVI 



I. INTRODUCTION 

A. MOTIVATION AND BACKGROUND 

One of the major concerns among analysts is how to address random variables 
describing possible “cost,” loss, and “damage,” but for which there is incomplete 
distributional information available. A possibility is to attempt to approximate such a 
loss random variable by a combination of explanatory random variables that are more 
accessible in some sense. This situation naturally leads to least squares regression 
and related models that estimate conditional expectations. While such models are 
adequate in many situations, they fall short in contexts where a decision maker is risk 
averse, i.e., is more concerned about upper-tail realizations of the loss random variable 
than average loss, and views errors asymmetrically with underestimating losses being 
considered more prejudicial than overestimating. 

Another approach is based on quantile regression (see Koenker, 2005; Gilchrist, 
2008 and references therein), which accommodates risk-averseness and an asymmetric 
view of errors by estimating conditional quantiles at a certain probability level such 
as those in the tail of the conditional distribution of the loss random variable. While 
suitable in some contexts, quantile regression only deals with the signs of the errors 
and therefore might be overly “robust” in the sense that portions of a data set can 
change without necessarily impacting the best-£t regression function, as illustrated 
below. 

In this dissertation, we focus on contexts where a decision maker is concerned 
about upper-tail realizations of the loss random variable, and errors are not only seen 
asymmetrically but their magnitude is also taken into account. Of course, a parallel 
development with an opposite orientation, focused on profits and gains, and concerns 
about overestimating instead of underestimating is also possible but not considered 
in this dissertation. 

Before we proceed with the literature review, we analyze one simple example. 
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We consider a loss random variable Y and an available explanatory random variable 
X. Since the distribution of the loss random variable Y might not be fully known, it 
may be benehcial to approximate Y by this random vector X. 

For this example, we have a table of 50 pairs of observations available, {x^,y^}, 
with i = 1, ...,50, as seen in the scatter plot in Figure 1. We consider a regression 
function of the form f{x) = cq + cx, with cq, c G M. This numerical example is 
artihcially designed to show how different regression techniques react to small data 
changes. 



Figure 1. Scatter plot of the data for the constructed example. 

Figure 2(a) gives the least squares and 0.75-quantile regression functions. We 
observe that the 0.75-quantile regression function divides the data set into two, such 
that 25% of the observations remain above the obtained regression, while the remain¬ 
ing 75% lay below. 
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X 


(a) Before any changes in the data set. 



X 


(b) After shifting six observations upwards. 

Figure 2. Least squares regression vs. quantile regression at a probability level 
a = 0.75, before and after some changes in the data set. 
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In Figure 2(b), we see how the least squares and the quantile regression models 
adjust to changes in the data set, denoted by the red dots. Notice that the observa¬ 
tions are moved upwards without changing their position relative to the 0.75-quantile 
regression curve. The balance of 25% of the observations above and 75% of them 
below the quantile regression curve has not been compromised. Therefore, as we can 
observe in Figure 2(b), the quantile regression curve does not shift after modifying 
the six observations. Such robustness is sometimes desirable, but at other times there 
is the need for responsiveness. In comparison, the least squares regression function 
shifts upwards reacting to the data changes. 



X 


Figure 3. Least squares regression vs. quantile regression at a probability level 
a = 0.75, before and after changing one observation in the data set. 


Changing only one observation, as shown in Figure 3, we note that the ob¬ 
tained quantile regression function changes its slope, while the least squares regression 
function hardly changes. If we shift this observation even further, the change in the 
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Figure 4. Least squares vs. quantile regression at a probability level a = 0.60. 

slope for the quantile regression function is even more significant. Once again the 
least squares regression model hardly changes. If we change this observation in red 
even further upwards, we would notice no more changes in the quantile regression 
function obtained in Figure 3, since the balance of the data above and below the 
quantile regression would no longer be compromised. 

Quantile regression is a robust regression technique, but its sensitivity to 
changes in data might sometimes be too small as indicated above. Other times the 
sensitivity might be too large as illustrated above where the change of a single data 
point triggers a jump in the regression curve. On the contrary, the least squares re¬ 
gression is more stable, with smooth adjustments in the curve comparable to changes 
in the data set. 

As another motivation to this novel regression technique, we consider a real- 
world data set: the Portuguese Navy submariners effort index, provided by the Por¬ 
tuguese Navy Submarine Squadron. In this data set we seek to estimate the random 
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variable Y that represents the effort index of the submariners. This index was cre¬ 
ated as a decision tool to support human resource management inside the Submarine 
Squadron. It allows planners to assess which submariners are “due” for another mis¬ 
sion. 

In Figure 4, we have 103 observations of number of years since a submariner 
has gained the insignia of the Portuguese submarine service (Xdoiphins) against the 
submariners effort index (T). In red and blue colors, we see two regression functions, 
the least squares and the 0.60-quantile regression, respectively. The 0.60-quantile 
regression £t analyzes the sign of the errors defined as the differences between the 
loss random variable Y and the chosen linear model. Instead of only regarding the 
signs of these errors, we want to also account for their magnitudes, namely we want 
to analyze the average of the 40% highest effort indices. 

These two examples motivate the need to move beyond least squares and quan¬ 
tile regression and develop superquantile regression. They illustrate how a regression 
technique such as the quantile regression, which accommodates risk-averseness and an 
asymmetric view of errors, may not be suitable in some contexts where the decision 
maker is also concerned with the magnitude of those errors as well as the “average 
worst-case” behavior. 

B. CONNECTIONS WITH THE LITERATURE 

A quantile corresponds to “value-at-risk” (VaR) in financial terminology and 
relates to “failure probability” in engineering terms. Quantile regression informs 
the decision maker about these quantities conditional on values of the explanatory 
random vector X. However, a quantile is not a coherent measure of risk in the 
sense of Artzner et al. (1999) (see also Delbaen, 2002); it fails to be subadditive. 
Consequently, a quantile of the sum of two random variables may exceed the sum 
of the quantiles of each random variable at the same probability level, which runs 
counter to our understanding of what “risk” should express. Moreover, quantiles cause 
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computational challenges when incorporated into decision optimization problems as 
objective function, failure probability constraint, or chance constraint. The use of 
quantiles and the closely related failure probabilities is therefore problematic in risk- 
averse decision making; see Artzner et al. (1999), Rockafellar and Uryasev (2000), 
Rockafellar and Royset (2010), Krokhmal et al. (2011), and Rockafellar and Uryasev 
(2013) for a detailed discussion. 

A superquantile of a random variable, also called “conditional value-at-risk” 
(CVaR), average value-at-risk, and expected shortfall, is an “average” of certain quan¬ 
tiles as described further below. We prefer the application-neutral name “superquan¬ 
tile” when deriving methods applicable broadly. This is a coherent measure of risk 
well suited for risk-averse decision making and optimization; see Wang and Urya¬ 
sev (2007) for its application in hnancial engineering, Kalinchenko et al. (2011) for 
military applications, and Rockafellar and Royset (2010) for use in reliability engi¬ 
neering. While this risk measure has reached prominence in risk-averse optimization, 
there has been much less work on regression techniques that are consistent with it in 
some sense. 

The foundation of least squares and quantile regression is the fact that mean 
and quantiles minimize the expectation of certain convex random functions. A nat¬ 
ural extension to superquantile regression could then possibly involve determining a 
random function that when minimizing its expectation, we obtain a superquantile. 
However, such a random function does not exist (as discussed in Gneiting, 2011; Chun 
et ah, 2012), which has led to studies of indirect approaches to superquantile tracking 
grounded in quantile regression. 

For a random variable with a continuous cumulative distribution function, 
a superquantile equals a conditional expectation of the random variable given real¬ 
izations no lower than the corresponding quantile. Utilizing this fact, studies have 
developed kernel-based estimators for the conditional probability density functions, 
which are then integrated and inverted to obtain estimators of conditional quantiles. 
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An estimator of the conditional superquantile is then hnally constructed by integrat¬ 
ing the density estimator over the interval above the quantile (Scaillet, 2005; Cai 
& Wang, 2008) or forming a sample average (Kato, 2012). These studies also in¬ 
clude asymptotic analysis of the resulting estimators under a series of assumptions, 
including that the data originates from certain time series. 

A superquantile of a random variable is dehned in terms of an integral of 
corresponding quantiles with respect to the probability level. Since the integral is 
approximated by a weighted sum of quantiles across different probability levels, an 
estimator of a conditional superquantile emerges as the sum of conditional quantiles 
obtained by quantile regression; see Peracchi and Tanase (2008), and Leorato et ah 
(2012), which also show asymptotic results under a set of assumptions including the 
continuous differentiability of the cumulative distribution function of the conditional 
random variables. Similarly, Chun et ah (2012) utilizes the integral expression for a 
superquantile, but observes that a weighted sum of quantiles is an optimal solution 
of a certain minimization problem; see Rockafellar and Uryasev (2013). Analogous to 
the situation in least squares and quantile regression, an optimization problem yields 
an estimator of a conditional superquantile. Though, in contrast to the case of least 
squares and quantile regression, the estimator is “biased” due to the error induced by 
replacing an integral by a hnite sum. Under a linear model assumption, Chun et ah 
(2012) also constructs a conditional superquantile estimator using an appropriately 
shifted least squares regression curve based on quantile estimates of residuals. In both 
cases, asymptotic results are obtained for a homoscedastic linear regression model. 
Under the same model, Trindade et ah (2007) studies “constrained” regression, where 
the error random variable Zj = Y — f{X) is minimized in some sense, for example 
in terms of least square or absolute deviation, subject to a constraint that limits a 
superquantile of Zf. While this approach does not lead to superquantile regression in 
the sense we derive in this dissertation, it highlights the need for alternative techniques 
for regression that incorporate superquantiles in some manner. 



The need for moving beyond classical regression centered on conditional expec¬ 
tations is now well recognized and has driven even further research towards estimating 
conditional distribution function, i.e., P {Y[x) < y} for all y & M, using nonparamet- 
ric kernel estimators (see for example Hall & Muller, 2003) and transformation models 
(see for example Hothorn et ah, 2014). We denote by Y{x) the conditional random 
variable Y given that X = x & Of course, conditional distribution functions 
provide the “full” information about Y(x) including its quantiles and superquantiles, 
and therefore also provide a means to inform a risk-averse decision maker. In this 
dissertation, however, we directly focus on superquantiles, which we believe deserve 
special attention due to their prominence in risk analysis. 

A framework for generalized regression is laid out in Rockafellar et al. (2008), 
and Rockafellar and Uryasev (2013), and regression functions are obtained as optimal 
solutions of optimization problems of the form mmfS{Zf), where T is a measure of 
error and / is restricted to a certain class of functions such as the affine functions. 
Least squares regression is obtained by £{Zf) = E[Z‘j\, quantile regression with the 
Koenker-Bassett measure of error, but many other possibilities exist. While it is not 
possible to determine a measure of error that is of the expectation type and yields 
a superquantile, in Section II.A we show that when allowing for a broader class of 
functionals, a measure of error that generates a superquantile is indeed available. 
Such a measure of error is also hinted at in Rockafellar and Royset (2014b), but this 
dissertation as well as the supporting paper by Rockafellar et al. (2014) gives the 
hrst comprehensive treatment. In contrast to previous studies towards superquan¬ 
tile tracking, which utilize indirect approaches and quantile regression, we here offer 
a natural extension of least squares and quantile regression. We replace the mean- 
squares and Koenker-Bassett (cf. eq. (II.9)) error measures by a new error measure, 
and then simply minimize that error of Zf to obtain a regression function. Un¬ 
der few assumptions, we establish the existence of a regression function, discuss its 
uniqueness, and examine stability under perturbations of the distribution of {X,Y) 
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for example caused by sampling. We omit a discussion of simple linear models with 
independent and identically distributed (iid) noise as we believe that there is little 
need for quantile and superquantile regression in such contexts as least squares re¬ 
gression with an appropriate shift suffices. In fact, we do not separate models into 
(additive) deterministic and stochastic terms. In many applications, especially in the 
area of uncertainty quantihcation, heteroscedasticity and dependence are prevalent 
making linear iid and additive models of little value. 

C. SCOPE OF DISSERTATION 

In this dissertation, we focus on two distinct situations where the importance 
of a novel regression methodology becomes apparent. We consider a loss random 
variable Y for which there is incomplete distributional information available, and an 
explanatory random variable X that is more accessible in some sense. 

We denote the first situation and the one we address more often during this 
dissertation by surrogate estimation. It usually occurs when the explanatory random 
variable is beyond our direct control, but the dependence between the loss and the 
explanatory random variable makes us hopeful that, for a carefully selected regression 
function, such explanatory random variable may serve as a surrogate for the loss 
random variable. When the distribution of the explanatory random variable is known, 
at least approximately, and the regression function has been determined, then the 
distribution of f{X) is usually easily accessible. That distribution may then serve 
as input to further analysis, simulation, and optimization in place of the unknown 
distribution of the loss random variable Y. Such surrogate estimation may arise 
in numerous contexts. “Factor models” in hnancial investment applications are a 
result of surrogate estimation (see for example Connor, 1995; Knight et ah, 2005), 
where the random variable we aim to estimate may be the loss associated with a 
particular asset and the explanatory variable a vector describing a small number of 
macroeconomic “factors.” “Uncertainty quantihcation” (see for example Lee & Chen, 
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2009; Eldred et al., 2011) considers the output of a system described by a random 
variable, for example measuring damage, and estimates its moments and distribution 
from observed realizations as well as knowledge about the distribution of the input 
to the system characterized by an explanatory random vector. A main approach here 
centers on surrogate estimation with the obtained regression function serving as an 
estimate of the loss random variable. 

Another situation arises when the primary concern is with the conditional 
loss given that the explanatory random variable X takes on specihc values. We aim 
to select these values judiciously in an effort to minimize the conditional loss. We 
denote this second situation by superquantile tracking. Of course, “minimizing” Y[x) 
is not well-dehned and a standard approach is to minimize a risk measure of Y{x)-, 
see for example Krokhmal et al. (2011), and Rockafellar and Uryasev (2013). An 
attractive choice is to use a superquantile measure of risk, which has nice properties 
and is also computationally approachable. While in some contexts a superquantile of 
the conditional loss can be evaluated easily for any specihc value of the explanatory 
random vector, there are numerous situations, especially beyond the hnancial domain, 
where only a table of realizations of conditional loss is available for various values of 
the explanatory random vector. In the latter situation, there is a need for building 
an approximating model, based on the data, for the relevant superquantile of the 
conditional loss as a function of the explanatory variables. 

D. CONTRIBUTIONS 

The main contribution of this dissertation is the development of a novel re¬ 
gression framework that naturally extends least squares and quantile regressions to 
contexts where one seeks to assess regression errors not by squaring them, as in the 
case of least squares regression, or by looking at their signs, as in the case of quantile 
regression, but by weighing larger levels of underestimation increasingly heavily in a 
ma nn er consistent with superquantiles. 
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This generalized regression technique is the first attempt to use superquantiles 
directly in the regression model as opposed to an approximation of conditional quan¬ 
tiles. We develop the fundamental theory for the new regression technique and deal 
with issues encountered in any generalized regression framework, such as existence 
and possible uniqueness of the obtained regression functions. We discuss consistency 
and stability of these regression functions under perturbations due to possible mea¬ 
surement errors and approximating empirical distributions generated by samples of 
the underlying distribution. And we also examine rate of convergence results under 
mild assumptions. We present means of assessing the goodness of £t of the obtained 
quantile and superquantile regression models, by applying the concepts of coefficient 
of determination, adjusted coefficient of determination, and Cook’s distance to quan¬ 
tile and superquantile regression techniques. 

We develop two distinct classes of computational methods, one solving the 
superquantile regression problem by means of analytical and numerical integration 
techniques, another by relying on the dualization of risk as a step to build a new 
regression problem that we apply to discrete cases. We discuss complexity results of 
both classes of computational methods, and compare them to the complexity results 
for least squares and quantile regressions. 

We present a series of numerical examples from the areas of financial invest¬ 
ment, military applications, reliability engineering, and uncertainty quantification. 

E. DISCLAIMER 

The information presented and views expressed in this dissertation do not 
reflect the official policy or position of the U.S. Navy, the U.S. Department of De¬ 
fense, the U.S. Government, the Portuguese Navy, and the Portuguese Ministry of 
National Defense or the Portuguese Government. The data sets we use in our two 
military applications numerical examples are obtained from unclassified sources, and 
are employed in this dissertation in order to illustrate some interesting and meaningful 
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conclusions from our theoretical results. 

The hrst military application example considers the results of an online survey 
of winged Naval Helicopter Pilots of the U.S. Navy; see Phillips (2011) for details. 
As stated in Phillips (2011), this study is approved by the NPS Institutional Review 
Board (IRB) and has an IRB protocol number: NPS.2011.0053-IR-EP7-A. The second 
military application example considers a data set provided by the Portuguese Navy 
Submarine Squadron. 

F. ORGANIZATION 

Chapter II addresses the foundations of the superquantile regression, as an 
extension of least squares and quantile regressions. The chapter discusses the su¬ 
perquantile regression problem, the issues encountered in such generalized regression 
frameworks, and provides an approach for assessing the goodness of £t of the obtained 
quantile and superquantile regression models. 

Chapter III develops two classes of computational methods to solve superquan¬ 
tile regression problems. The first denoted by primal method solves superquantile 
regression problems using analytical and numerical integration schemes. The second 
which we call the dual method is based on the dualization of risk and utilizes such 
advantages to build a new superquantile regression problem with promising compu¬ 
tational performance, especially for large sample sizes. It also discusses complexity 
results for the presented algorithms. 

Chapter IV provides several numerical results that illustrate not only the pri¬ 
mal and dual methods, but also some of the main applications of the superquantile 
regression, such as superquantile tracking and surrogate estimation. 

Chapter V summarizes the theoretical and numerical results, presents our 
conclusions and suggests future research opportunities. 
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II. FOUNDATIONS OF SUPERQUANTILE 

REGRESSION 

In this chapter, we develop a regression techniqne that extends least sqnares 
and quantile regressions, centered on expectations and quantiles, respectively, to one 
that focuses on superquantiles. This material to a large extent is based on Rockafellar 
et al. (2014). 

Section II.A describes measures of error, risk, deviation, and regret, first in the 
context of quantile regression and then for the extension to superquantile regression. 
Section II.B defines superquantile regression as the minimization of a measure of er¬ 
ror, provides an alternative approach for solving superquantile regression problems 
based on the measure of deviation, discusses existence and uniqueness of the regres¬ 
sion function, and provides asymptotic results. Section II. C proposes an approach 
for assessing the goodness of fit of the regression function obtained by quantile and 
superquantile regressions, using extensions of the definitions of coefficient of determi¬ 
nation and Cook’s distance. 

A. QUANTILES, SUPERQUANTILES, AND ERRORS 

While our development centers on superquantiles, it is beneficial to maintain 
a parallel description of quantiles. As we will see in Subsection II.A.4, quantile 
regression achieved by minimizing a Koenker-Bassett error of the random variable Zf, 
as seen in Subsection II.A.3 in more detail, provides a road map for the construction 
of superquantile regression, which is simply achieved by minimizing another measure 
of error. We start, however, with definitions and assumptions, and then provide an 
overview of the fundamental risk quadrangle, its application to the superquantile as 
the statistic, and finally we present the corresponding measures of error, deviation, 
and regret of quantiles and superquantiles. 
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1. Definitions and Assumptions 

We consider a loss random variable F as a fnnction on a probability space 
and in onr context, we assnme that Y has a hnite second moment, as 

follows 

Y ■.= J^, P) := {F : ^ iR I F is P-measurable, E[Y^] < oo}. (II.l) 

Here hi is a sample space with ca G hi being a possible ontcome; P is an event 
space; and P is a probability measnre that assigns probabilities to these events, 
P ; [0,1], 

We now give some usefnl dehnitions. We consider the following distinct func¬ 
tionals on in the sense of Rockafellar and Uryasev (2013), that assign numerical 
values to random variables, e.g., a loss random variable F. A measure of error S(Y) 
quantihes the “nonzeroness” in F. The £^-norm of F is a possible measure of error. 
A measure of risk P(F) serves as surrogate for the overall loss in F. For example, 
one could think of P(F) = sup{F} (the essential supremum) as such a surrogate, 
or less conservatively P(F) = P[F]. A measure of deviation ViY) quantihes the 
“nonconstancy” as uncertainty in F, and can be seen as a generalization of the stan¬ 
dard deviation of F. A measure of regret V(F) quantihes the displeasure of obtaining 
mix realizations of F, which might be better when F < 0 (representing “gains”) or 
worse when F > 0 (representing “losses”). And a statistie S{Y) is associated with F 
through S and V, as described below. 

According to Rockafellar et al. (2008), we say that a measure of risk is coherent 
if the following axioms hold: 

(i) P(c) = c for a constant c. 

(ii) P(AF) = \1Z{Y) when A > 0 (positive homogeneity). 

(hi) P(F + F') < P(F) + P(F') (subadditivity). 

(iv) P(F) < TZ{Y') when Y <Y' (monotonicity). 
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This definition is equivalent to the one described in Artzner et ah (1999), where 
axiom (i) is replaced by translation invariance. When we refer to a coherent measure 
of risk, we refer to the axioms listed above. The concept of a coherent measure of risk 
is important in our context because it follows the natural way we think about risk, 
where monotonicity is a requirement. Moreover, if TZ{Y) > i?[T], for a nonconstant 
random variable Y, then TZ{ ) is averse. 

According to Rockafellar and Uryasev (2013), a regular measure of risk sat¬ 
isfies the axiom (i) stated previously, as well as convexity, aversity, and closedness, 
{Y\7l(Y) < c} for all constants c E M. Obviously, the expectation is not averse, 
therefore not regular. 

Examples of measures of risk are quantiles and super quant lies of a loss random 
variable Y at distinct probability levels a, as we define below. For a probability 
level a G (0,1), the a-quantile of a random variable Y with cumulative distribution 
function Fy is defined as 

g„(y) := min {y E]R\ Fyiy) > a} . 

Its quantiles are as fundamental to Y as the distribution function, but are problem¬ 
atic to incorporate in risk analysis and optimization due to their lack of coherency 
as well as increased computational challenges; see Rockafellar and Royset (2014b). 
Superquantiles have more favorable properties. For a G [0,1), the a-superquantile of 
a random variable Y is defined as 

Uy)--=^ [\qiy)d/3. (II.2) 

1 ^ J a 

Since a superquantile is a coherent measure of risk (see Rockafellar & Uryasev, 2000; 
Rockafellar & Uryasev, 2002) and by virtue of being an “average” of quantiles, it is 
also more stable than a quantile in some sense, and is well suited for applications. 
For a = 1, we define qaiX) ■= sup{y}. Since 

C Fy\/3)d/3 = 

0 


E[Y], (II.3) 


qo{y) = / qq{y)df3 
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we therefore focus on a G (0,1) throughout the dissertation to avoid distractions by 
these special cases. 

Equivalent to equation (II.2), we have an even more stable and conservative 
measure of risk, the a-second-order superquantile, and it is dehned as 

Uy)--=T^ tq^iy)d/3, (II.4) 

^ ^ J a. 

for a random variable Y ^ C? and a G (0,1). 

In reliability terminology, quantiles and superquantiles correspond to failnre 
and buffered failure probabilities. The failure probability of a loss random variable Y 

is 

p(F):=P{y >0} = 1-Fy(0), 

which corresponds to 

p{Y) = 1 — 0 ; with a such that qaiX) = 0 

if there is no probability atom at zero. Analogously to the latter expression, the 
buffered failure probability (see Rockafellar & Royset, 2010) of a loss random variable 
Y is dehned as 

p{Y) := 1 — a with a such that qaiX) = 0. (II-5) 

Requiring that p(Y) < 1 — o; is therefore equivalent to the constraint qa(Y) < 0. 
Consequently, in applications with a buffered failure probability constraint on a (con¬ 
ditional) loss random variable Y[x) as well as when the goal is to minimize a su- 
perqnantile of Y{x) directly, there is a need to estimate qa(Y(x)) as a function of 
X G J?". Qnantiles and superqnantiles are connected through a trade-off formula 
that leads to quantile regression, as discussed in Subsection II.A.3. 

2. Overview of the Fundamental Risk Quadrangle 

The “Fnndamental Risk Quadrangle” is a concept introduced by Rockafellar 
and Uryasev (2013), which establishes the connections between distinct measures, 
described in Snbsection II.A. 1, of a random variable whose orientation is such that 
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upper-tail realizations are unfortunate and low realizations are favorable, as described 
in Chapter I. The interrelationships of such numerical quantities allow distinct com¬ 
parisons and applications in various analyses, such as risk management. 

Diagram 3 in Rockafellar and Uryasev (2013) dehnes the general relationships 
between hve properties of a random variable Y, measures of error, risk, deviation, 
and regret, and the corresponding statistic, as we list below. We use these general 
relationships in the next two subsections. 

Error measure = SiY) = V{Y) — E\Y] 

Risk measure = IZiY) = minc(,{co -|- V{Y — cq)} 

Deviation measure = T>{Y) = 'R-{Y) — E[Y] 

Regret measure = V{Y) = £{Y) + E\Y] 

Statistic = 5(R) = argmin^^jlco -|- V{Y — cq)} = axgm.m.^{S{Y — cq)} 

We now look at the families of risk quadrangles where the expectation and the 
quantile are the statistic. The following two risk quadrangles are described in detail 
in Rockafellar and Uryasev (2013). We list both quadrangles for illustration and to 
exemplify how one obtains least squares and quantile regressions by minimizing a 
certain measure of error. 


Variance Version of Mean-based Quadrangle: 

(Example 1’ in Rockafellar & Uryasev, 2013) 

Error measure = £{Y) = 

Risk measure = 'R,{Y) = E[Y] + Xa‘^{Y) 
Deviation measure = V(Y) = Xa‘^{Y) 

Regret measure = V{Y) = E\Y] + XE\Y’^] 
Statistic = SiY) = E\Y] = mean 
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Quantile-based Quadrangle: (at any probability level a G (0,1)) 
(Example 2 in Rockafellar & Uryasev, 2013) 

Error measure = Sa(Y) = E max {0, E} + max {0, —E}] 
Risk measure = IZaiY) = qa(X) = u-superquantile 
Deviation measure = Dq,(E) = qa(Y) — E[Y] 

Regret measure = Va(E) = j^E [max{0,E}] 

Statistic = S(Y) = qa(Y) = a-quantile 


With the idea in mind that one minimizes a measure of error to obtain its correspond¬ 
ing statistic in the sense of the “Fundamental Risk Quadrangle,” we realize that this 
approach allows us to naturally extend the existing foundations of least squares and 
quantile regressions to create new foundations for superquantile regression. 

3. Quantile Regret and Error Measures 

Both tt-quantiles and a-superquantiles, with a G (0,1), of a loss random 
variable E are expressed in terms of an optimization problem involving the measure 
of regret 

V«(E) := ^E[max{0,E}], 

1 — Q; 

as seen in Rockafellar and Uryasev (2013). Quantiles and superquantiles then follow 
as 


qa(y) e argmin {cq + V„(E - cq)} (II.6) 

Cq^R 

ga(E) =min{co +V„(E-co)}, (II.7) 

co&R 

where in fact qa(X) is the lowest optimal solution if multiple solutions exist. 

The expression for qa{Y) is the essential building block for quantile regression, 
but since we ultimately wish to go beyond the class of constant functions as candidates 
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for a regression function we need to pass to a measure of error constructed from 
Vq, by setting 

So.{Y) ■.= V^{Y)-E[Y] 

for any loss random variable Y (with -£^[11^1] < cxd). Direct application of the definition 
of the measure of error and recognition that a constant term in an objective function 
is immaterial with respect to the optimal solution gives 


with 


qa(Y) e aigminSa(Y - Cq), (II. 8) 

cq^R 


Sa{Y - Co) 



—i?[max{0, Y — cq}] — E[Y — cq] 
a 

Q; , , , 

--maxjO, Y — coj + maxjO, —Y + coj 

1 — a 


(11,9) 


being a (scaled) Koenker-Bassett error (Koenker, 2005). Quantile regression centers 
on computing this argmin with “minimizing the error of D — Cq over Cq G M” replaced 
by “minimizing the error of D — f{X) over a class of functions / : iR" —)■ M,” often 
taken to be the affine functions. We view qa{Y) as the “closest” scalar to the random 
variable Y under a Koenker-Bassett error. 

If our goal simply were to estimate qaiY) of a loss random variable Y for a 
given probability level a G (0,1), the above expressions would have sufficed, pos¬ 
sibly passing to an empirical distribution given by a sample if Ey is unknown. In 
the present context, however, connections with the underlying explanatory random 
vector X and the focus on the “approximation” of Y warrants a parallel develop¬ 
ment to that of quantile regression but now centered on a superquantile. In view of 
the above review of quantile regression, it is clear that superquantile regression will 
involve the minimization of some measure of error that returns the superquantile as 
argmin. Classical least squares regression can be viewed similarly as returning a (con¬ 
ditional) expectation as argmin when minimizing the mean-square measure of error, 
i.e., E\Y] = argmin^jjg^i?[(K — cq)^]. The next subsection develops such a measure 
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of error by first constructing a corresponding measure of regret, for the superquantile 
as the statistic. 

4. Superquantile Regret and Error Measures 

We start this subsection by establishing the hniteness of a superquantile under 
the assumption that the loss random variable Y has a hnite second moment. 

We know from Rockafellar and Uryasev (2013) that Qa is a convex, positively 
homogenous, monotonic, and averse functional on for a G (0,1). From Theorem 
3 in Rockafellar and Royset (2014b), see also Rockafellar et ah (2014), we know that 
Qa is bounded as stated next. We adopt the notation cr‘^{Y) = E[{Y — E[Y])‘^]. 

Proposition II.1. For Y E and a G (0,1) one has that 

«„(y) < BIK] +(11,10) 
V1 — a 

Proof. 

Suppose that the quantile qaiX), viewed as a function of the probability level, is 
continuous at a. Let Jq, be the indicator function of the interval [qa(Y), oo). We then 
have by the Schwarz inequality that 

(l-a)fc(y-B[y]) = El(Y-ElY])Q 

< ^E\(Y - E\Y]Y]EWI] 

< a{Y)y/l — a. 

Then, since qa{Y — T^iR]) = qaiX) — fh® result follows from dividing by 1 — a. 

Thus, (II. 10) is valid under the continuity assumption about the quantile, which is 
true for all but at most countable many a. By continuity on both sides of (II. 10) 
with respect to a, it must then hold for all a G (0,1). 

□ 

The measure of regret at probability level a G (0,1) that serves in the context 
of superquantile regression is dehned for any loss random variable R as 

K(R) := 7^ Vo(R), (II.ll) 

1 — a 
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where 

Vo{Y)-.= [ max{0,q^{Y)}d/3. (11.12) 

Jo 

These expressions appear in Rockafellar and Royset (2014b), where their discovery, 
which is related to the Hardy-Littlewood transform, is described. Here, we provide 
the alternative, direct proof of Rockafellar et ah (2014), on how these expressions 
lead to the superquantile as optimal solution of (II.7). We start, however, with two 
preliminary results and the dehnition of a corresponding measure of error. 

Lemma II. 1 . For Y G C^, 

Vo(h") < (t(Y) + max{0, E[Y] + a{Y)}. (11.13) 


Proof. 

From (II. 10) and (11.12) we have 

Vo(h^)< [ max{O,0y(/3)}d/3 for eyi/J) = E[Y] + ^^a{Y). (11.14) 

Jo V I ~ P 

We consider three cases. In Case 1, we suppose that 0y{I3) > 0 for all (3 G [0,1]. 

Then the right hand side of (11.14) is given by 

[ 9Y{P)dp = E[Y]+a(Y) [ {l-P)-^/^d/3 with [ {1 - P)-^/^d/3 = 2. (11.15) 
Jo Jo Jo 

Therefore, VoCF) < E[Y] + 2a{Y) in Case 1. In Case 2a, we suppose that 0y{/3) < 0 

for all G (0,1). Then obviously VoCF) < 0. Finally, in Case 2b, let 0y{/3) < 0 

for some /3 G (0,1), but not all. Then necessarily criY) > 0 and E[Y] < —(j(Y), 

and 0y{(3) strictly increases with respect to [3. Let a be the unique (3 G (0,1) with 
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Oyia) = 0, namely when y/1 — a = a{Y)/[—E\Y]). Then we have that 


max{0,6*y (/3)}(i/3 = / 9Y{f3)d/3 


= {l-a)E[Y] + <j{Y) / (1 - 

J a 

= {l-a)E\Y] + 2a{Y)^/l^ 


B|y]2' 

o(y? 

-E\Y] 

< <^(n- 


E[Y] 


Thus, in Case 2b we get Vo(b^) < criY). The conclusion then follows by putting 
together the cases. 

□ 

We observe that for a G (0,1), Va is also a convex, positively homogeneous, 
monotonic, and averse functional on C^, which follows from the properties of the 
superquantile (Rockafellar & Uryasev, 2013), and by the above result it is also hnite, 
and consequently continuous. A corresponding measure of error is dehned for Y ^ C? 
by 

■.= My)-E[Y] (11.16) 

and referred to as a superquantile error. Obviously, is also convex and positively 
homogeneous. It also satishes the following properties. 


Proposition II. 2. For any a G (0,1) and Y G a superquantile error satisfies 

(a) BaiY) = 0 when Y = 0, 

(b) BaiY) > 0 when Y ^ 0, and 

(c) BaiY) a/{I - q)}\E\Y]\. 

Proof. 

Since qyit)) = 0 for all G [0,1], (a) follows trivially. 
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Since Vq, is averse, we have that for Y ^ C? Sa{Y) = Va{Y) — E[Y] > E[Y] — 
E[Y] =0 when Y is not a constant. To complete part (b), we therefore only need to 
consider nonzero constants. If F is a positive constant K, then 
1 


1 — q; 


max{0,qis{Y)}d/3 — E[Y] > / ma.x{0, qp{Y)}d/3 — E[Y] 

Jo 

> K-E[Y] 

> 0 . 


If y is a negative constant K, then 

- [ max{0, qj3(Y)}d/3 — E[Y] = - [ max{0, K}d/3 — E[Y] 

1-a Jq 1-a Jo 

= 0-E[Y] 

> 0 , 

which completes part (b). 

Since qjsiY) > E\Y] for all /3 G [0,1], we have whenever E\Y] > 0 the bound 


1 — a 


max{0, q0{Y)}df3 — E[Y] > 


> 


1 — a 
a 

1 — a 


m&x{Q,E\Y]}dlJ - E[Y] 


E[Y]. 


And when E\Y] < 0, 


— [ max{0, qfi{Y)}d/3 - E[Y] > —— [ max{0, E[Y]}d/3 - E[Y] 
1-a Jo 1-a Jo 

> -E[Y]. 


Part (c) then follows by combining the two results. 

□ 

By Proposition II.2 and the above discussion, Sa is a regular measure of error. 
We now show that a superquantile is a unique optimal solution of optimization prob¬ 
lems involving Va and As mentioned, the connection between a superquantile and 
Vq, is also reached in Theorem 7 of Rockafellar and Royset (2014b) through different 
means. Here we derive the direct proof and the connection with a superquantile error 
(see Rockafellar et ah, 2014). 
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Theorem II.1. (Superquantile as optimal solution) For Y ^ C? and a G (0,1), 

qa{Y) = argminjco + Va(T - Co)} 

co&R 

= argmin£Q,(F — Co). (11-17) 

co&R 

Proof. 

Let (p{c) = c + Vq,(T — c) and t/’^(c) = max{0,g^(F) — c}. These are both convex 
functions of c, and tjjp is nonincreasing. We can use the criterion that 


c G argmin 99 (c) 9 ?+(c) > 0, V 9 '_(c) < 0, 


where, because of the monotonicity of ipp, 


F+{c) — 1 + 

= 

Therefore 


1 — a 


{^/jfiy_{c)d/3, 


-1 if q 0 {Y) > c, 
0 if g^(F) < c, 


<^'_(c) = 1 + 


M-{c) = 


1 — a 


{^/J|3)y{c)d|3, 


-1 if q 0 {Y) > c, 
0 ifg^(y)<c. 


{'il)p)y{c)dj3 = / (tl)p)'_{c)dj3 


= -( 1 - 7 ) forc = g^(F), 


in which case (^/’/ 3 )'(c) = (^/ 3 )+(c) = (^/ 3 )'_(c) = l-(l- 7 )/(l-ci;). Thus, (^/’/ 3 )'(c) = 0 
corresponds to c = ^^(T) for 7 = 0 . Consequently, the first equality of the theorem 
holds. The second follows directly from (11.16) and the fact that a constant in an 
objective function is immaterial with regard to the argmin. 

□ 


The foundations for quantile regression are given by equations (II.6) and (II.8). 
Analogously, the expressions in (11.17) provide the path to superquantile regression 
as developed in Section II.B. In fact. Theorem II.1 shows that (jaiX) is the uniquely 
“closest” scalar to Y in the sense of the superquantile error. The optimal value in 
(11.17) dehnes a measure of risk (see Rockafellar & Royset, 2014b) 

HaiY) := min{co + V„(R - Co)} 

cosi? 

= qaiy) + My -UY)) 
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for F G analogously to qaiX) in (II-7)- A corresponding measure of deviation is 
given by 

Va{Y) := mmSa{Y - Co) 
coG-R 

= n^{Y)-E[Y]. (11.18) 

We note that parallel to (II.2) (see Rockafellar & Royset, 2014b), 

n^{Y) = ^ [\^{Y)d/3 
1 ^ J a 

and, consequently, 

VUY) = -^ f qf,{Y)d/3 - E[Y]. 

The measures of regret, error, risk, and deviation, Vq,, Sa, TZa, and Va, respectively, 
for a probability level a G (0,1), form a family of risk quadrangles, in the sense of 
Rockafellar and Uryasev (2013), that corresponds to the superquantile as the statistic, 
as shown below. 


Superquantile-based Quadrangle: (at any probability level a G (0,1)) 

Error measure = Tq,(F) = max {0, qi^iY)} djd — E[Y] 

Risk measure = TZaiY) = qa(Y) = a-second-order superquantile 
Deviation measure = Va(Y) = qa(Y) — E[Y] 

Regret measure = Va{Y) = jW j^iax {0, q^(Y)} dl3 
Statistic = i5(F) = qa(Y) = a-superquantile 


We note here that the measure of deviation Va plays a central role in the 
remainder of the dissertation as it facilitates simplihcations, goodness of £t tests, and 
computational methods. 
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B. SUPERQUANTILE REGRESSION 

Theorem II. 1 and the development leading to qnantile regression direct us 
to a new regression methodology that is centered on a superquantile error. The 
next subsection poses the regression problem, provides its properties, and discusses 
stability under perturbations. The section ends with a discussion of superquantile 
tracking. 

1. Superquantile Regression Problem 

While Theorem II. 1 shows that the “best” scalar approximation of a random 
variable Y in the sense of a superquantile error is the corresponding superquantile, 
we now go beyond the class of constant functions to utilize the connection with an 
underlying explanatory random vector X. We focus on regression functions of the 
form 

f{x) = Co + (c, h{x)), cq e ]R,c e 1?™, 

for a given “basis” function h : IR^ —)■ This class satishes most practical needs 

including that of linear regression where m = n and h{x) = x. Extensions beyond 
this class are also possible but not dealt with in this dissertation. 

We now dehne the Superquantile Regression Problem SqR, for any h : R^ —)■ 
R!^ and a G (0,1), where 

Z(co,c) :=F-(co + (c,h(X))) 

is the error random variable, whose distribution depends on cq, c, h, and the joint 
distribution of {X,Y). We denote by C C R^~^^ the set of optimal solutions of SqR 
and refer to (cq, c) G C as a regression vector. 


Superquantile Regression Problem: 

SqR: min (Z(co, c)) = —*^— [ miix{0,qfj{Z{co,c))} d/3 - E[Z{co,c)]. 

co£R,c£R'^ i — a Jq 




The objective function Sa{Z{-,-)) is well-defined and finite when the distri¬ 
bution of {X,Y) and h is such that Z(co,c) E C} for all cq G -K, and c G IR^. A 
sufficient condition that ensures this property is that Y, hi{X),hm{X) E . We 
adopt the notation 


H = h{X), Hi = hi{X), i = 1, 2,m. 

Lemma II. 2. IfY, Hi,H^ E C?, then Z{cq, c) E C? for all Cq G M, and c E -K™. 

In surrogate estimation, cq -t- {c,h{X)), with (co,c) G C, provides the best 
approximation of Y in the sense of a superquantile error. For example, after having 
computed (cq, c), the analysis could proceed with examining the moments, quantiles, 
and superquantiles of Cq -|- (c, h{X)) as surrogates for the corresponding quantities of 
Y. If X is Gaussian and h is affine, then cq -|- (c, h{X)) is a Gaussian approximation 
of Y easily examined and utilized in further studies. It may also be of interest to 
examine Cq -|- (c, h{X)) under hypothetical distributions of X. 

As a direct consequence of the Regression Theorem in Rockafellar and Uryasev 
(2013) (see also Theorem 3.1 in Rockafellar et ah, 2008), we obtain that a regression 
vector can equivalently be determined from a measure of deviation 

Proposition II.3. Suppose that Y, Hi,Hm E C? . Then, the set of regression 
vectors C of SqR is equivalently obtained as 

C = \ (co,c) G 1?™+^ I c G argminT'a(Zo(c)), cq = g„(Zo(c))| , 

L cei?™ J 

where Zo{c) ■=Y — (c, h{X)). 

Proposition II.3 implies computational advantages as the {m + l)-dimensional 
optimization problem SqR is replaced by a problem in m dimensions with a simpler 
objective function, which we fully utilize in Ghapters III and IV. Moreover, the result 
also proves to be benehcial in analysis of regression vectors. 
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We now define the Deviation-based Superquantile Regression Problem DSqR, 
for any h : IR^ —)■ IR!^ and a G ( 0 , 1 ): 

Deviation-based Superquantile Regression Problem: 

DSqR-. min V^{Zq{c)) = -^ [ qp{Zo{c))d/3 - E[Zo{c)], 

1 - a 

with Co being obtained by setting cq = ga(-^o(c)). 

The existence of a regression vector is ensured by the next result, which also 
provides conditions for uniqueness. 

Theorem II.2. (Existence and uniqueness of regression vector) IfY, Hi, G 
then SqR is a convex problem with a set of optimal solutions C that is nonempty, 
closed, and convex. 

(a) C is bounded if and only if the random vector X and the basis function h satisfy 
the condition that (c, h{X)) is not constant unless c = 0. 

(b) If in addition, for every (co,c), (co,c') G with c ^ d, there exists a /3o G 

[ 0 , 1 ) such that 

0 < qg{Z{cQ, c) + Z{do, c)) < qg{Z{co, c)) + qg{Z{cQ, c)) (11.19) 

for all (3 G [do, 1), then C is a singleton. 

Proof. 

Since F G implies that Sa(Y) < oo, by Lemma II.1, we deduce the two first 
conclusions from Theorem 3.1 in Rockafellar et ah (2008). Hence, we only need to 
show that C is a singleton. 

Suppose for the sake of a contradiction that (co,c), (cq,c') G C and (co,c) 7 ^ 
(cq,c'), with corresponding optimal value ^ > 0 , i.e., f = Sa{Z{co,c)) = Tq,(Z(cq, c')). 
We consider two cases. 

First, suppose that ^ = 0. By Proposition II.2, Z(co,c) = Z(dQ,d) = 0 and 
consequently 

Co + (c, H) = Cq + {d, H), 
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which implies that (c — c', H) = Cq — Cq. Under the assumption that (c, h{X)) is only 
constant when c = 0, we must have that c — c' = 0. Then, also Cq — cq = 0 follows, 
which contradicts the hypothesis that (co,c) 7 ^ (co,c'). 

Second, suppose that > 0. If c = c', then a direct consequence of Propo¬ 
sition II.3 and the fact that every random variable has a unique superquantile at 
each probability level, is that also cq = Cg, which again contradicts our hypothesis. 
Consequently, we focus on the case with c 7 ^ c', for which there exists a /3o such that 
(11.19) holds for all (3 G [/do, !)• Trivially, then 

max{ 0 ,g/ 3 (Z(co,c) -+- Z(co,c'))} < max{0, g/ 3 (Z(co, c))} -h max{0, g/3(Z(co, c'))} 

for /d G [/3o, !)• If /5 G (0,1) is such that c) -|- 2’(cg, c')) < 0, then 

max{0,g/3(^(co,c) •+• Z(co,c'))} < max{0, g/3(2'(co, c))} -h max{0, g/3(Z(co, c'))} 

as the left-hand side vanishes and the right-hand side is nonnegative. Hence, 

1 

max{ 0 , qf}{Z{cQ, c) -h ^(cg, c'))}dp 

< / max{ 0 , g/ 3 (Z(cg, c))}(i/d-f / max{ 0 , g/ 3 (Z(cQ, c'))}(i/3 

Jo Jo 

and also 

Sa{Z{Co, c) -h Z{Cq, c')) < £a(Z(Co, c)) £a(Z(Cg, c')). (11.20) 

Let 

(c",c") = (l/2)(cg,c) + (l/2)(c',c') 

and therefore 

2 Z(c",c") = ^(co,c) + Z(c'g,c'). 

By the optimality of the positive homogeneity of Tq,, and (11.20), we hud that 

2^<2S^(Z(c'',c")), 
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and that 


2^,(Z(c",c")) = SU2Z(c'',c")) 

< £a(Z(Co,c)) + £a(Z(cQ,c)). 

Since 

^a(^(Co, c)) + £a(^(Co, c')) = 2^, 

we finally get that 

2^ < 2^, 

which cannot hold. In view of this contradiction, the conclusion follows. 

□ 

While Theorem II.2 gives a sufficient condition for uniqueness of the regression 
vector, in general uniqueness cannot be expected. For example, suppose that the 
random vector (X,V), with X scalar valued, has the possible and equally likely 
realizations (1,1), (2, 2), and (3,1). Then, g^(Zo(c)) = max{l — c, 2 — 2c, 1 — 3c} for 
(3 > 2/3 and E[Zo{c)] = 4/3 — 2c. It is straightforward to show that for a > 2/3, 
any c G [—1,1] minimizes Pq,(Zo(-)). Consequently, in view of Proposition II.3, any 
c G [—1,1], with a corresponding Cq = max{l — c, 2 — 2c, 1 — 3c}, minimizes Tq,(Z(-, •)) 
for a > 2/3, as shown in Figure 5. The minimum error is 2/3. 

A unique regression vector is indeed achieved in the normal case as stated 

next. 

Proposition II.4. Suppose that {H, Y) is normally distributed with positive definite 
variance-covariance matrix. Then, C is a singleton. 

Proof. 

Let S be the variance-covariance matrix of {H,Y), with Cholesky decomposition 
S = LL^. For any fi G (0,1) and c G Zq(c) is also normal with mean E[Zq{c)] = 
(c, E[{H, F)]) and variance a‘^{Zo{c)) = (c. Sc), where c = (—c, 1). Thus, 

qg{Zo{c)) = E[Zo{c)] + kgaiyZo^c)) = E[Zfic)] + A;; 3 ||L’^c||, 
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0 12 3 4 


Explanatory Variable x 


Figure 5. Example of multiple optimal solutions for problem SqR. 


where kp = (/)($“^(/9))/(l — f3), with cj) and $ being the standard normal probability 
density and cumulative distribution functions, respectively. 

For c, c' G iR"*, with c 7 ^ c', there is no constant k > 0 such that (—c, 1) = 
k{—c', 1). Let c = (—c, 1) and c' = (—c', 1). Since E is positive dehnite, the upper- 
triangular matrix is unique and full rank. Consequently, the null space of 
contains only the zero vector and L^(c — fcc') 7 ^ 0 for all scalars k > 0. Since the 
triangle inequality for two vectors holds strictly whenever the two vectors cannot be 
expressed as a positive multiple of each other, we therefore hnd that 

||L^c +Lac'll < ||L^c|| + ||L^5'||. 


Now for the sake of a contradiction suppose that c, c' G IR^ both minimize 
Pa(Zo(-)) and attain the minimum value ^ E M, but c 7 ^ c'. Let 


c" = (l/ 2 )c-|-(l/ 2 )c', c" = (—c",l), and 7 a = / kisd/3/{1 — a) > 0. 
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Then, 

P„(Zo(c")) = ^ /'g^(Zo(c"))ci/3-i?[^o(c")] 

^ ^ J a 

= E[Zo{c")] + 7a||^^c'l| - E[Zo{c")] 

= |^||L^c +Lac'll 

< l^dlL^SII + IILVII), 

and since 

|^(||L^c|| + ||LTc'||) = ^-[e[Zo{c)]+^4L'^c\\-E[Zo{c)]) + 

+ i [e[Zo4)] + 7a||^^5'll - E[Zo{c')]) 

= l(v4Zo{c))) + l(v4Zo{c4) 

= 2 ^^ + 0 
= e, 

we have that 'Dq(Zo(c")) < However, this contradicts the optimality of c, d and we 
reach the conclusion. 

□ 

We next turn to consistency and stability of the regression vector. Of course, 
the joint distribution of {X,Y) is rarely available in practice and one may need to 
pass to an approximating empirical distribution generated by a sample. Moreover, 
perturbations of the “true” distribution of {X, Y) may occur due to measurement 
errors in the data and other factors. We consider these possibilities and let {X'', Y‘') be 
a random vector whose joint distribution approximates that of {X, Y) in some sense. 
For example, (X'^, Y^) may be governed by the empirical distribution generated by an 
independent and identically distributed sample of size v from {X,Y). Presumably, 
as z/ —)■ oo, the approximation of {X, Y) by {X‘', Y'^) improves as stated formally 
below. Regardless of the nature of {X’^, Y^), we dehne the approximate error random 
variable as 

Z"(co,c) ■.= Y^ -Co-{c,h{Xn), 
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and the corresponding Approximate Superquantile Regression Problem as fol¬ 

lows: 


Approximate Superquantile Regression Problem: 

SgR" ■. min SaiZ'" (cq^c)) = - [ max {0, g«(Z'^(co, c))} d/?— 

coerj,cei?’" 1 — a Jq 

-E[Z‘'{co,c)]. 


The next result shows that as {X‘', V^) approximates {X, Y), a regression vec¬ 
tor obtained from SgR' approximates one from SgR, which provides the justihcation 
for basing a regression analysis on SgR^. Below, we let —denote convergence in 
distribution and 

= h(X^), = hi(X’'), i = l,2,...,m. 

Theorem II.3. (Stability of regression vector) Suppose that {X’^,Y'^), v = 1,2,..., 
and {X,Y) are n + 1-dimensional random vectors such that {X'^,Y’') {X,Y) 

and that the basis function h is continuous except possibly on a subset S C J?” with 
P{X G S'} = 0. Moreover, let Hi,Y E C?, E[{H'(Y] < oo, i = 1,2, ...,m, and 

sup,,E[(F^)2] < CX). 

{ido,R)}fLi is a seguence of optimal solutions of SgR!', with a E (0,1), 
then every accumulation point of that seguence is a regression vector of SgR. 

Proof. 

Let (co, c) E ]R^^^ be arbitrary. By the continuous mapping theorem (see for example 
Theorem 29.2 in Billingsley, 1995), 

Z'^{co,c) = Y‘'-co-{c,h{Xn) Z{co,c) = Y-co-{c,h{X)). 

By the assumed moment conditions in (II. 1), there exists a constant M < oo that 
bounds from above the terms 

maxE[|Rj|], maxE[(iLi)^], sup supE[{Hff], 

and E[\Y\], E[Y^], supE[|F'^|], sup.^[(F'^)^]. 
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In view of Lemma II.2 and its proof, we deduce that 


E[{Y^ - Co - (c, <M + 2{\\c\\m^/‘^M + {M + \co\)\\c\\mM) + ||c|pmM (11.21) 

for all u. Hence, Z'^{co,c) is uniformly integrable (for fixed co,c) and 

E[Z-{co,c)] ^ E[Z{co,c)]<oo- (11.22) 


see Billingsley (1995), Theorem 25.12 and its corollary. 

By Theorem 4 in Rockafellar and Royset (2014b), a sequence of random vari¬ 
ables converges in distribution to a random variable if and only if the corresponding 
a-superquantiles, viewed as functions of the probability level a, converge uniformly 
on every closed subset of (0,1). Consequently, qj 3 {Z^{co, c)) —)■ g/ 3 (Z(co, c)) uniformly 
in /d on closed subsets of (0,1). Moreover, since the 0-superquantile coincides with 
the expectation, (11.22) implies that qo{Z’^{co,c)) —)■ qo{Z{co,c)) also holds. These 
facts and the observation that the superquantile of any random variable is continuous 
and nondecreasing as a function of the probability level, ensure that for any e > 0 
and S G (0,1), there exists an integer z/(e, 5) such that for all z/ > z/(e, 5), 

sup \qf}{Z‘'{co,c)) - q 0 {Z{co,c))\ < ^ (11.23) 

/3e[o,i-5] - d) 


Then, 

y^l —<5 

lo 


max |o, g/ 3 (Z'^(co, c))|(i/3 — f max |o, g/ 3 (Z(co, c))|(i/3 


< 


< 


»l-<5 


r-l-<5 


g^(Z^(co,c)) -g^(Z(co,c)) 


d(3 


dl3 


lo 2(1-5) 


e 

< - 
- 2 


(11.24) 


for all u > z/(e, 5). Following an argument similar to that in Lemma 11.1, we hnd that 


'l-(5 


max{0,g/3(Z(co,c))}d/3 < 5^/V(Z(co, c)) 


max 10, SE[Z{co, c)]5^/V(Z(co, c))|. (11.25) 
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Moreover, the reasoning that leads to (11.21) also gives 


E[Z{co, c)] < M + |co| + ||c||mM. 


(11.26) 


These facts show that there exists a positive constant M < oo (which depends on cq 
and c) such that \E[Z{co,c)]\,a{Z{co,c)) < M. Hence, from (11.25), we hnd that 

(11.27) 


'1-5 


max < 0, g/ 3 (Z(co, c)) Id/S < 


Let e < 12M and 5^ = (e/(12M))^. Then, = e/4 and 


' 1 - 5 , 


max < 0, qis{Z{co, c 


e 

< 

- 4 


(11.28) 


An identical result holds for Z^{co, c). Let qp^Z^^CQ, c))^ = max{0, qi 3 {Z’^{co, c))} and 
g/ 3 (Z(co, c))+ = max{0, g^(Z(co, c))}. Consequently, for all z/ > z/(e,(5e), 

^ qp{Z'"{co,c)) d/3 - [ qfj{Z{co,c)) d/3 


< 


rl — Se rl — Se 

/ qp{Z'^{co,c))^d/3 - / qp{Z{co,c))^d/3 

'o Jo 

+ [ qi3{Z‘'{co,c)) d/3 + [ qf){Z'"{co,c)) d/3 


'1-5, 


'1-5, 


e e e 
- 2 ^ 4^4 
< e. 


The fact that i7[Z'"(co, c)] —)■ E[Z{co,c)] < oo and the assumption that (co,c) is 
arbitrary, imply that Sa{Z’^{-, •)) —)■ Sa{Z{-, •)) pointwise on . Lemma II.l and 

the above moment assumptions imply that Ba{Z'^{-^ ■)) and Tq,(Z(-, •)) are hnite-valued 
functions. They are also convex, which follows directly from the convexity of on 
and the affine form of Z'^ and Z as functions of cq and c. Consequently, by Theorem 
7.17 in Rockafellar and Wets (1998), Tq(Z''(-, •)) epiconverges to Ba{Z{-,-)). The 
result then follows from Theorem 7.31 in Rockafellar and Wets (1998). 

□ 
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When the Approximate Superquantile Regression Problem SqR'^ is constructed 
using an independent identically distributed sample of size u from the distribution 
of {X, Y), we obtain the following corollary which follows from the properties of the 
empirical distribution. 

Corollary II.1. Suppose that the basis funetion h is continuous except possibly on a 
subset S C iR" with P{X G S'} = 0 and that Hi,Y G C‘^, i = 1,2, Moreover, 

let {X‘', Y’^) be distributed according to the empirical distribution generated by an in¬ 
dependent and identically distributed sample of size v from the distribution of {X,Y). 
Then, the conclusion of Theorem 11.3 holds. 

We next examine the rate of convergence of regression vectors obtained from 
the approximate problem SqR!' to those of SqR corresponding to the “true” distribu¬ 
tion. It appears difficult to obtain asymptotic distribution theory for superquantile 
regression without additional assumptions, which among other consequences should 
ensure unique optimal solutions of SqR. We prefer another route that leads to a rate 
of convergence result under mild assumptions. 

Quantihcation of the stability of the set of optimal solutions of an optimiza¬ 
tion problem under perturbations depends on a “growth condition” of the problem, 
which is difficult to quantify for SqR-, see Section 7J in Rockafellar and Wets (1998). 
Consequently, we focus on the better behaved e-regression vectors of SqR dehned for 
e > 0 as 

c. :=|(co,oA)eiR”^+i 

with an analogous dehnition of the e-regression vectors of SqR'^ denoted by Cf. The 
rate with which Cf tends to C depends, naturally, on the rate with which {X‘',Y'^), 
underlying SqR!', tends to (X, Y) of SqR in some sense. Before we make a precise 
statement, we introduce a convenient notion of distances between any two nonempty 
sets A, B C For p > 0, let 

dIp{A, B) ;= inf{p > 0|A fl plB C B plB, B fl plB C A-\- rjlB}, 


T„(Z(co,e,Ce)) < min 

coei?,cei?” 


Sa{Z (Co, c)) -|- 
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where IB is the Euclidean ball in with unit radius and center at the origin. 

Roughly, dIp{A,B) is the smallest amount the sets need to be “enlarged” to ensure 
they contain the other one, with an exclusive focus on points no further from the 
origin than p. This restriction facilitates the treatment of unbounded sets. 

As we see next, the rate of convergence is directly related to the rate with 
which the random vector 

:= {H'^ - H,Y’' -Y), 

describing the approximation error, tends to zero. 

Theorem II. 4. (Rate of convergence of regression vector) Suppose that {X’^^Y'^), 
u = 1,2,..., and {X,Y) are n + 1-dimensional random vectors generating SqBh' and 
SqR, respectively. Moreover, let Hi,Y G sup EKHf)"^] < oo, i = 1,2, ...,m, and 
sup^ E[{Y’')‘^] < oo. Let pq > 0 be such that pqIB fl C 7 ^ 0 and p^lB fl 7 ^ 0. 

Then, for p > po, there exist positive constants ki,k 2 , and ks (dependent on 
p) such that for any e > 0 and z/ = 1, 2,..., 

(i + y) Z;[||A‘'|||(A^imtK|o,log(^jj^j|+A^2)+A^3l|B|A1|| 

whenever E'[||A'^||] > 0 and dIp{C’(,Cfj = 0 otherwise. 

Proof. 

By Theorem 3(a) of Rockafellar and Royset (2014b), for (3 G [0,1), 

qp{Z''{cQ,c)) - qp{Z{co,c)) < ^^E[|Z^(co, c) - Z(co, c)|], 

and since 

^B[|Z7c„,c)-Z(c„,c)|| = ^B||(c,A-)|], 

we get that 

qp{Z^{co,c))-qp{Z{co,c)) < ^^E[|(c, 

< ^||c||E[||A1|], (11.29) 
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where c = (—c, 1). Then, for S G (0,1), 


^1—S ^1—S 

/ niax{0, g 0 (Z‘'(co, c))}d/3 — / niax{0, g/s(Z(co, c))}dj3 

Jo Jo 

<[ \QfiiZ''ico,c)) - gfi{Z{co,c))\dl3 
Jo 

<||a||B[||A-||]^‘ ‘^dl3 

<-||c||B|||A-||]loga. (lUO) 

Let p > Po and M be an upper bound on first and second moments of {HJ, \H^\, |F|, 
and as in the proof of Theorem II.3. Since |(c, iL)| < ||c|| YJT=i 1-^*1 (c,< 

||c|p we hnd that E[\{c,H)\] < ||c||mM and E[{c,H)‘^] < ||c|pmM. Con¬ 

sequently, 

E[{Y-co-{c,H)f] < E[{Y-coY] + 2\E[{Y-co){c,H)]\+E[{c,H)^] 

< M + 2{\\c\\m^^‘^M + [M + |co|)||c||mM) -|- ||c||^mM. 

(11.31) 


Then, for ||(co,c)|| < p, it follows by (11.26) that 

\E[Z{co, c)]| < M + p + pmM 

and by (11.31) that 

/ N 1/2 

(t(Z(co, c)) < [M + 2 (ypnJ'’^M + (M -|- p)pmM') + p^mMj , 

with identical bounds for \E[Z^{co,c)]\ and a{Z’^{co,c)). Let Mp be the larger of the 
two previous right-hand sides. 

By (11.25), analogously to (11.27), we have that for ||(co,c)|| < p, 

[ max{0, (jis^Z{co, c))}d/3 < SMpS^^'^ (11.32) 

Jl-5 

and similarly with Z(co,c) replaced by Z'^(co,c). 
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We also find that for ||(co,c)|| < p, 


E[Z"(co,c)]-E[Z(co,c)] 


(c,B[A'']) 

< ll?lll|B|A1ll 

< (l + rtl|B[A 


(11.33) 


Then, collecting the results of (11.30), (11.32), and (11.33), and for ||(co, c)|| < p, 
we obtain 

£a(^‘'(co, c)) - T„(Z(co, c)) 


< 


< 


max{0, q/ 3 (Z’'(co, c))}d/3 — / niax{0, gp(Z(co, c))}(i/d 


ElZ‘'(co,c)]-ElZ(co,c)] 

I 

r*l—5 ^1—S 

maxjO, gp(Z'^(co, c))}(i/3 — / niax{0, g/ 3 (Z(co, c))}(i/3 


maxjO, gp(Z'^(co, c))}(i/3 + / niax{0, gp(Z(co, c))}(i/d 


'1-5 


'1-5 


+ E[Z‘'{co,c)]-E[Z{co,c)] 

< -(1 + p)i?[||A^||] log<5 + + (1 + p)||i?[A'^]||. 


(11.34) 


We next determine the choice of 5 G (0,1) that minimizes the previous bound and 
consider two cases. First, if 


0<lp(B|||A''||])"<l, 


with 


kp : = 


2(1+ P) 
GMr, 


then differentiation gives that the bound is minimized with 6 = /cp(i?[|| A'^H])^. Second, 
if 


then 


kp{E[\\A'^\\]f>l, 
4(l + p)E[||A^ 


Mp< 
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and the bound 


- (1 + p)E[|| A-ll] logi + + (1 + pjllEIA-lll 

< -(1 + p)i![||A-||] logi + 4(1 + p)£|||A''||]^''^ + (1 + p)|| J![A1||, 

for any 6 G (0,1). Consequently, combining the two cases, there exist constants ki, 
/c 2 , and ks (which depend on p), such that for ||(co,c)|| < p, 

£a(Z‘'(Co, c)) - £a(Z(Co, c)) 

< kiE[\\A''\\] max |o, log } + ^2^[ll^1l] + ^3||^[A'']|| 

<^[||A''||] (^kimax |o,log ^ | + ^ 2 ^ + ^3||^[A'']||- 

Direct application of Example 7.62 and Theorem 7.69 of Rockafellar and Wets (1998) 
then yields the conclusion for i7[||A'^||] > 0, where the additional coefficient (l + 4p/e) 
originates in that theorem. Finally, if i7[||A'^||] = 0, then, in view of (11.29) and the 
fact that this implies that ||i7[A'^]|| = 0, we hnd that for ||(co,c)|| < p, 

\SaiZ''{Co,c)) - £a{ZiCo,c)) \ = 0 . 

The final conclusion then follows by again invoking Example 7.62 and Theorem 7.69 
of Rockafellar and Wets (1998). 

□ 

Theorem II.4 shows that the distance between and is almost proportional 
to i7[||A'^||], but with a minor correction by a logarithmic term. If the approximation 
is caused by measurement errors of magnitude 1/z/, i.e., the absolute value 
of each component of [X^ — X,Y'^ — Y) is no greater than 1/u almost surely, then 
E'[||A'^||] < y/m + l/u and the expressions can be simplihed. For .^ > 0, log a; < 
for sufficiently large x E M. Consequently, for any ^ G (0,1) and sufficiently large z/, 

£ (i + 7) 77. 
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where /c > 0 can be determined from ki, k 2 , k^, and m. That is, the Enclidean 
distance between an e-regression vector of SqR’^ to one of SqR is for ^ G (0,1) 

arbitrarily close to zero. 

2. Superquantile Tracking 

We next tnrn to the sitnation where the primary concern is with the conditional 
loss Y[x) given that the explanatory random variable takes on specihc valnes, X = x. 
We seek to estimate qa(Y{x)) for x G J?”, or a snbset thereof, with the goal of 
eventnally minimizing, at least approximately, qa{Y{x)) by a jndicious choice of x. 
Of course, with incomplete knowledge about the distributions ofY{x) this is a difficult 
task that can be achieved only approximately. For example, there is no guarantee 
that a regression function f = Co + (c, h{-)), with (cq, c) G C obtained by solving SqR 
using a G (0,1), tracks qa(Y{x)), i.e., f{x) = qa(Y{x)) for all x G The hope 
of such “exact” superquantile tracking becomes even less realistic when SqR must 
be replaced by an approximation SqR'^ as typically required in practice. However, 
“local” superquantile tracking is possible, at least approximately, as stated in the 
next proposition. Moreover, tracking is achieved under certain model assumptions. 
For example, if we have that Y = co + (c, X) + e, for some Cq G J?, c G J?”, and where 
e is independent of X, then superquantile tracking is guaranteed; see Theorem 5.1 in 
Rockafellar and Royset (2014a). 

Here we consider the situation where there is a sample of Y[x) for some values 
of X, but this sample is not large enough to allow pointwise estimation of qa(Y{x)) 
for every x of interest. There may even be no x for which there are multiple sample 
points of Y (x). Concentrating on a particular x G iR", we hope to estimate qa(Y (£)) 
by using samples from Y[x) for x near x, weighted appropriately. The weights should 
be nonnegative, sum to one, and can be thought of as an artihcially constructed 
probability distribution associated with the sample. Specihcally, suppose that Xi, i = 
l,...,z/, are the points where the sample is observed and ?/*, i = l,...,z/, are the 
corresponding realizations of Y{xi). When estimating a superquantile at x, we put 
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more “trust” on sample points taken near x and consequently the weight of {xi,yi) may 
be inversely proportional to \\Xi — i;||, with an appropriate adjustment if x coincides 
with an Xi. 

A justihcation for the approach follows directly from Theorem II.3 throngh 
the next proposition. 

Proposition II.5. Suppose that the assumptions of Theorem II.3 hold and that 
the probability distribution of {X, Y) is degenerate at x G in the sense that 

P{(X, F) < (x,?/)} = (piy), for all y E M and x > x, where ip{y) = P{Y{x) < y}, 
and P{{X,Y) < {x,y)} = 0 otherwise. 

//{(cq, is a sequenee of optimal solutions of SqR', with a G (0,1), then 

along every convergent subsequence we have that Cq + {c’^,h{x)) tends to qa(X{x)). 

Proof. 

For the given degenerate distribution of (X, Y), cq + (c, h{X)) = cq + (c, h{x)) almost 
surely. Consequently, SqR rednces to the error minimization problem of Theorem 
II.1 and Co + {c,h{x)) = qa(Y{x)) for every (co,c) G C. The conclusion then follows 
from Theorem II.3. 

□ 

Suppose that the weights of {xi,yi), i = 1,2, in the above construction 
are chosen to approximate the degenerate distribution of Proposition II.5, for example 
by setting them inversely proportional to \\Xi — a;||. Then, in view of Proposition II.5, 
a solntion of SqR’^, constructed using those weights as an artihcial probability distri- 
bntion for {X’^,Y^), leads to an approximation of the considered snperquantile at x. 
Of course, this procedure can be repeated for different points x to generate a “global” 
assessment of qa{Y{x)) as a fnnction of x and eventually facilitate optimization over 
X. Moreover, the process can be repeated with new or augmented sample points in 
a straightforward manner. In a sitnation where a sample is not fnlly randomly gen¬ 
erated but x-points are determined by an analyst, the approach may even motivate 
scattering those points near a point of interest x instead of concentrating them all at 
X exactly. The former approach certainly results in a better “global” nnderstanding 
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of a superquantile as a function of x, but may prove to be a more economical route to 
estimate a superquantile at x too. We examine this situation numerically in Chapter 

IV. 


C. VALIDATION ANALYSIS 

Regression modeling must be associated with means of assessing the goodness 
of £t of a computed regression vector. The process of validating a regression fit is 
important as it allows us to decide whether the obtained numerical results quantify 
how well the model explains and predicts future outcomes. A commonly used mea¬ 
sure that allows such assessment is the coefficient of determination. In least squares 
regression, this coefficient, also known as R-squared, is defined as 




^^Res 


where S'S'^es denotes the residual sum of squares and SS^ the total sum of squares. 
While cannot be relied on exclusively, it provides an indication of the goodness of 
fit that is easily extended to the present context of superquantile regression. In our 
notation. 


R-^ = 1 


E[Z{co,c) 

a^{Y) 


(11.35) 


and similarly when passing to an approximate random vector From Ex¬ 

ample 1’ in Rockafellar and Uryasev (2013), we know that the nnmerator in (11.35) 
is a measure of error applied to Z(co,c) and that its denominator corresponds to 
the measnre of deviation n^(-). Moreover, the minimization of that error of Z(co,c) 
results in the least sqnares regression vector. According to Rockafellar and Uryasev 
(2013), these measures of error and deviation are in correspondence and belong to a 
family of risk quadrangles that yields the expectation as its statistic. Therefore we 
could write the formula for R^ as follows 


_ T(Z(co,c)) 
ViY) 


(11.36) 
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This observation motivates the following dehnition of coefficient of determination 
applied to quantile regression. 


Definition II. 1. In quantile regression, the coefficient of determination of a regres¬ 
sion vector (co, c) G is given by 


where 2’(co, c)+ 


^a(co,c) 


Ta(^(co,c)) 

V^{Y) 

^ ^)+ + ^ (*^0’ 

uy) - e[y] 


= max{0, Z{cq, c)} and Z{co, c)_ = max{0, —Z(co, c)}. 


(11.37) 


In least squares regression, the coefficient of determination is a value expressed 
between zero and one, which leads us to the following proposition. 


Proposition II. 6. For a regression veetor (cq, c) G R^^^ 
that 


0 < Rl{co,c) < 1. 


and a G (0,1), one has 
(11.38) 


Proof. 

By the dehnition of coefficient of determination in quantile regression and of quantile 
error and deviation measures, in the sense of Rockafellar and Uryasev (2013), we have 
that 


Rl{co,c) = 1 


= 1 


= 1 - 


Sa{Z{co,c)) 

VaiY) 

S^{Y-co-{c,h{X))) 
min SaiY-^) 

S^{Y-co-{c,h{X))) 


(11.39) 


T„(F-e) 

where is an optimal solution to min^gi? £a{Y — ^). Both quantile error and 
deviation measures are nonnegative quantities, which proves the upper bound. Since 
the regression vector (cq, c) G R^~^^ is obtained by 


min Sa (Y -Co - (c, h{X ))), 

(co,c)eR"*+i 
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we guarantee that 

S^{Y -co-{c,h{X))) < 
in equation (11.39), which gives i?^(co,c) > 0. 

□ 

Applying the same idea to superquantile regression, we obtain the following 
dehnition of the coefficient of determination. 


Definition II. 2. In superquantile regression, the coefficient of determination of a 
regression vector (cq, c) G is given by 


Rlico,c) 


Sa{Z{co,c)) 

VaiY) 

ih Jo g/3(^(co, c))} d/3-E [Z(co, c)] 

- i?[r] 


(11.40) 


In fact, a similar dehnition can be formulated for any generalized regression 
consisting of minimizing an error of Zf. 

The bounds on the coefficients of determination for least squares and quantile 
regressions, can also be applied to the coefficient of determination in the superquantile 
regression case, using the same arguments as in the previous proof. 

Proposition II. 7. For a regression vector (co,c) G and a G (0,1), one has 

that 

0 < Rlico:C) < 1. (11.41) 

Proof. 

By the dehnition of coefficient of determination in superquantile regression and of 
superquantile error and deviation measures, in the sense of Rockafellar and Uryasev 
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(2013), we have that 


Rlico,c) 


Sa{Z{co,c)) 

Va{Y) 

S^{Y-co-{c,h{X))) 
min Sa{Y-Cj 

B^{Y -Co-{c,h{X))) 


(11.42) 


where is an optimal solution to min^gK BaiY — B)- Using the same arguments as 
in the proof of Proposition II.6, we arrive at the conclusion. 


□ 


As in the classical case, higher values of R\ are better, at least in some sense. 
Indeed, SqR aims to minimize the error of Z{co, c) by wisely selecting the regression 
vector (co,c) and thereby also maximizes R^, 


argmin£^a (Y — [cq + (c, h{X))]) argmaxi?^(co, c). (11.43) 

Co,C Cq,C 

The error is “normalized” with the overall “nonconstancy” in Y as measured by its 
measure of deviation to more easily allow for comparison of coefficients of determina¬ 
tion across data sets. 

However it is possible to obtain large coefficients of determination by adding 
explanatory terms to a regression model, i.e., increasing m, but without necessarily 
achieving a more useful model. Hence, it is usual in least squares regression to also 
evaluate an adjusted coefficient of determination that penalizes any term added to 
the model that does not reduce variability substantially. This quantity only increases 
if a new term reduces SS^es/{^ — m — 1) as seen by the dehnition 

p2 SS^^R{u-m-l) 

SSt/{i^ - 1 ) 

where u is the number of observations. Naturally, then, we dehne an adjusted coeffi¬ 
cient of determination for quantile and superquantile regressions similarly in the case 
where the distribution of {X, Y) has a hnite support of cardinality u. 



48 



Definition II. 3. In quantile regression, the adjusted coefficient of determination of 
a regression vector (co,c) G is given by 


-^o,Adj c) 


ga(^(co,c))/(z/ - m - 1) 
P,(F)/(z/-l) 


(11.45) 


Definition II. 4. In superquantile regression, the adjusted coefficient of determina¬ 
tion of a regression vector (cq, c) G is given by 


-^o,Adj c) 


ga(^(co,c))/(z/ - m - 1) 

V^{Y)/{u-l) 


(11.46) 


Again, similar expressions are available for other generalized regression techniques. 

When performing least squares regression analysis, we have other commonly 
used validation methods. These include computing the Cook’s distance for each 
observation used in the model, which provides an estimate on how an observation 
influences the obtained regression fit. This distance allows the decision maker to 
easily understand which observation might be considered an outlier and which should 
be checked for validity. In least squares regression, the Cook’s distance for observation 
i is defined as 


(f(x)-fHx)Y {f{x)-ff-\x)y 

' mMSE m(f(X)-Yf ' ' ’ 

where /^*^(-) represents the fitted regression function without observation i, and MSE 
denotes the mean square error of the regression model. Using the measure of error 
that is corresponding to the expectation as the statistic, and similar to the approach 
in (11.36), we could write the formula for the Cook’s distance in (11.47) for observation 
i as follows 


£(f(X)-f»(X)) 

' ■ m £(f(X) - y) 

With this in mind, we next define Cook’s distances applied to quantile and superquan¬ 
tile regressions. 
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Definition II. 5. In quantile regression, the Cook’s distance estimates for a regression 
vector (co,c) G is given by 


c) 


g.(/(X)-/h)(X)) 
m ^„(/(X) - Y) 

E fe{/W - + {/W - 

mE [j^ifiX) - + {/(X) - F}_] 


(11.49) 


where = max{0, Y} and IC = max{0, —Y}. 


Definition II. 6. In superquantile regression, the Cook’s distance estimates for a 
regression vector (cq, c) G is given by 


Ei^CiicQ^ c) 


4(/(X)-/W(X)) 

m Sa{-Z{co,c)) 

jX niax{0, qMjX) - f^HX))}dp - EjfjX) - /W(X)] 

Jq max{0, g«(-Z(co, c))}d/3 + ElZ(co, c)] 

(11.50) 


As shown in Section II.B, we only use one assumption when building our 
superquantile regression problem, finite second moments for the random variables. 
This generalization allows the regression problem to be applied in many situations, 
but makes validating the obtained model a harder process. 

For the scope of the dissertation we do not develop other model validation 
techniques since we discard many of the commonly used model assumptions, such 
as normality, or homoscedasticity, that are usually requirements for such assessment 
tests. However, we recall that cross-validation is a tool to take into account for 
validating the regression model, especially for larger sample sizes z/. Obviously, when 
the sample size is small and we choose a high probability level a, subdividing the 
sample into training and testing data sets is not a wise decision. 

In the next chapter, we develop computational methods that allow us to im¬ 
plement these theoretical results. 
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III. 


COMPUTATIONAL METHODS 


In this chapter, we develop computational methods that allow us to solve the 
superquantile regression problems of Section II.B. This computational task consists of 
solving the convex optimization problem SqR, or in practice the approximate problem 
SqR'^ due to incomplete distributional information. 

In the next two sections, we describe convenient means for solving the su¬ 
perquantile regression problems when has a discrete joint distribution with 

z/ possible realizations. Regardless of the distribution of a reformulation of 

the approximate problem SqR'^ in terms of the deviation measure Va is benehcial. In 
view of Proposition II.3, the task of determining a regression vector (cg, c‘') reduces to 
that of minimizing Pq(Zq(-)), obtaining c'^ as an optimal solution, and then setting 

Since it is straightforward to compute every superquantile of a random variable 
Y with a discrete probability distribution, as follows 

V 

E VjVi if a = 0, 

i=i 

( i \ V i—\ i 

E p* - « h/* + E VjVj if E Pi < « < E Pi < 

i=i j i=*+i J i=i i=i 

Vv if a > 1 - pi.. 


with pj being the corresponding probabilities of the realizations of R, which are 
ordered from smaller to larger, we only focus on the minimization problem, which 
takes the following form: 
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We denote these computational methods by primal methods since we compute 
the regression vectors solving the original problem. The material in Section III.A is 
to a large extend based on our paper Rockafellar et al. (2014). 

In Section III.B, we use a different approach that relies on the dualization of 
risk and using the theory developed in Rockafellar and Royset (2014c), we generate 
a computational method that we denote as the dual method. 

A. PRIMAL METHODS 

The next subsections describe two primal computational methods for solving 
DSqR^. The hrst one solves our problem by analytical integration, while the second 
one utilizes numerical integration techniques. 

1. Analytical Integration 

At hrst one might get the impression that numerical integration is required 
for solving DSqR'^, but this may not actually be needed as we show next. Suppose 
that has a discrete distribution with support j = 1,2, ...,z/ and 

probability of occurring P{{X^,Y'^) = {x\y^)} = Ijv for j = l,2,...,z/. This is the 
case we typically encounter in applications, where (x^,y^), j = l,2,...,z/, is the data 
assumed to be equally likely to occur. We then obtain signihcant simplihcations in 
the approximate regression problem DSqR'. 

For any hxed c G the cumulative distribution function of Zq{c) is a 

piecewise constant function with at most u steps. The range of the distribution 
function is {0, 1/u, 2/z/,..., 1} or a subset thereof. By partitioning the integral over 
in DSqR^ according to this range, and accounting for the fact that the integral starts 
at a, we can then rewrite the optimization problem in this case as 

min -1- y r qf(Z-(c))dfi - (III.l) 

c£H i — U J ft. . 

Hl—1 

where z/q, := [z/a], with [a] being the smallest integer no smaller than a & M, 
= a, and /3i = ijv, for i = z/^, z/« + 1,..., v. 
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We recall that 


qa(y) e argmin {cq + V„(F - Cq)} 

Cq^R 

ga(y) = min {cq + V„(F - cq)} . 

co&R 

Consequently, 

5fl(Z„“(c)) = mml7^ + ^ B[mtK{ZS(c)-C/^,0}| (IIL2) 

= W(Zo(c)) +£:[max{ZJ(c)-g,)(ZJ(c)),0}| 

for each [5 G [0,1). 

The special piecewise-constant structure of the cumulative distribution func¬ 
tion of Zq{c) implies that qp{ZQ{c)) is constant as a function of (3 on the intervals 
(/3i_i,/3i), for every i = + 1, •••, Consequently, Up, for (3 G (a, 1) in equation 

(III.2) can be replaced by a hnite number of variables so that equation (Uhl) takes 
the form 


I fPi ( 1 

min - > / min ( C, H--i?[max|Zn(c 

cei?™ \ - a ^ VidR \ I- (3 

% — Va 


V.o}] 


- E[zpc)]. 


The last integral simplifies further since for (3 G {(3^-1, (3^) = (1 — 1/z/, 1), we have 
that 


9/3(^o(c)) = M{c) := . max - {c,x^), 


and therefore 
1 


1 — q; 




mm 

U^&R 


u. + Y3^^[max{Zo"(c) - C^O}] ) d/3 


1 — a 


M{c) 


M{c) 


//3.-1 ^(1 - «) 


Consequently, equation (Uhl) takes the form 


r3^ 


mm 


ceR™ 1 — q; 


'ft-: 


Z/, min I Cj -h ^ _ ^ £^[max{Zo(c) - Ui, 0}] j dj3 


1-/3 


I pi'7-rrii 
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The order of minimization is immaterial and we can equivalently consider 


^ / (Ui + Y^E[max{ZQ{c) - Ui, 0}]'j dl3 

^ i = Ua '' Pi-i \ P / 

M(c) 


mm - 

cei?™, 1 


- E[Z^,{c)l 


z/(l — a) 

where we let U = U^_i) G . 

In order to simplify the notation in our minimization problem, we dehne a^, 
for i = z/a, Va + 1, •••, zz — 1, as follows 

rPi I 


tti ;= 


-dj3 = log(l - A_i) - log(l - A). 


'ft-i 1 - 

Using this notation, equation (Uhl) simplihes even further to 


mm 


cS-R™, 1 — a 


^ u—l _ i/—l 

- /3i-i)Ui + _ £^[max{Zo(c) - Ui,0}]ai 


M{c) 
z/(l — a) 


ElZSlc)]. 


By introducing another set of auxiliary variables and using the standard transcrip¬ 
tion technique for handling max-functions, we reach the linear program P^p that 
implements analytical integration. 
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Problem: 


P, 


LP • 



-1 

v-l 




-1 

u-l 

V 


min 

'.,U,V,W 

1 

1 — a 

M 

1 

/3i- 

l)Ui 

+ 

1 

z/(l- 

^ ^ ttiVij 




i=Vot 




i=Uoc 

i=i 







+ 

w 

z/(l- 

■ a) 

-jyf- 

V 

{c,h{x^))) 

S.t. 

yj 

— (c, h{x^ 

))■ 

- Ui 

< 

Vij , i 

= Va, ■ 

..,1^-1 ,j ■■ 

= l,...,z/ 





0 

< 

Vij , i 

= ■ 

,j : 

= l,...,z/ 



- (C; 

,h{ 

p)) 

< 

w, j ■- 

=1,... 







c 

G 






u - 

“ ) • • • 

,u 

v-l) 

G 






V = 

■ ■ ■ ■} 

Vu- 

-l,v) 

G 

1 








W 

G 

R. 





This equivalent reformulation of DSqR^ involves m + {u — z/q)(z/ + 1) + 1 
variables and 2{u — z/q,)z/ + u inequality constraints. In practice, with the probability 
level a being set close to 1, = [z/a] may be close to the number of observations z/. 

Consequently, the linear programming problem P^p becomes large-scaled when the 
sample size z/ is large and decomposition algorithms may be needed. 

Alternatively, we consider next a numerical integration-based scheme that 
avoids some auxiliary variables and constraints, and also handles the situation where 
the distribution of {X'', Y‘') is not uniformly discrete. 

2. Numerical Integration 

The integral in DSqR^ is easily approximated using standard numerical in¬ 
tegration techniques. Suppose that the interval [a, 1] is divided into fx subintervals, 
where a < Pq < (3i < ... < (3^-1 < (3^ < 3 and Wj > 0,i = 0,1, are factors 
specihc to the integration scheme. An approximation of DSqR^ then takes the form 


55 




Problem: 

^Num ■ mm - E[Z^{c)]. 

i=0 


For large /x, an optimal solution of problem is close to that of DSqR^, 

as seen next, under conditions that are satished by essentially all commonly used 
numerical integration schemes. 


Proposition III.l. Suppose that for any continuous function g : [a, 1] M, a 
numerical integration scheme with discretization points a < (3^ < (di < ... < (d^-i < 
< 1 and weights Wi > 0,i = 0, 1, satisfies 


E 

i=0 


Wig{/3i) - / g{/3)d/3 


0 


as p, —)■ oo. Let be a sequence of optimal solutions of under this 

numerical integration scheme. Then, every accumulation point of is an 

optimal solution of DSqB!'. 


Proof: 

For any c G qg^ZQ^c)) is hnite and continuous as a function of (3. Consequently, 
the assumption on the numerical integration scheme applies and the objective function 
of converges pointwise to that of DSqR^, as p —)■ cx). 

The objective functions are also hnite and convex in c, which follows directly 
from the convexity of fa on and the affine form of Z^ as a function of c. Conse¬ 
quently, by Theorem 7.17 in Rockafellar and Wets (1998), the objective function of 
^Num epiconverges to that of DSqBJ' and the conclusion follows from Theorem 7.31 
in Rockafellar and Wets (1998). 

□ 

While specialized solvers, such as Portfolio Safeguard in American Optimal 
Decisions, Inc. (2011), handle P^’lf^ directly with little difficulty under many circum¬ 
stances, the problem is typically nonsmooth and standard nonlinear programming 
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solvers may fail. However, following a simple reformulation of , utilizing equa¬ 
tion (II. 7), we obtain the equivalent linear program formally stated below, where we 
assume for convenience that /3fj, < 1. 


Problem: 


Num,LP • 


1 

mm- 

c,u,v 1 — a 


s.t. - {c, h{x^)) - Ui 


0 


c 

u (wq) rti,..., ti^) 
V = (^0,1, 


< 

< 

e 

e 

e 


- {c,h{x^))) 
i=i 

Vij, i = 0, l,...,/r ,j = l,...,z/ 
vij, i = 0, ,j = l,...,z/ 

]R^ 


If /3f^ = 1, then a straightforward modihcation is required based on the fact that 
gi(ZQ(c)) = msiXj=i^ 2 ,...,uy^— {c,x^). This linear program consists of 
variables and 2u{fi + l) constraints, which may be substantially less than what follows 
from the analytical integration approach for large u. Here we assume that the weights 
Wi > 0,i = 0, l,...,yU, are given and therefore not accounted for in the complexity 
analysis results. For example, in Chapter IV we assume that the y + 1 subintervals 
have the same weights. And in practice, we hnd that a moderately large y suffices as 
shown in the numerical examples discussed in the same chapter. 

B. DUAL METHODS 

We now turn to a distinct perspective towards the alternative superquantile 
regression problem DSqR. We use the theory of the dualization of risk to build a 
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dual problem as described in the next subsection. We then solve this new problem 
using different algorithms, as seen in Subsections III.B.2 through III.B.4. 

1. Dualization of Risk 

We start this subsection by recalling the risk measure TZa corresponding to the 
superquantile as the statistic. According to equation (11.18), the measure of deviation 
for our superquantile-based quadrangle is described as follows 

1 

V^{Y) = 7^„(y) -E[Y] = / qp{Y)d/3 -E[Y] = UY) -E[Y], (III.3) 

^ ^ J a 

where TZaiY) = qa{Y) is the risk measure for which we build the dual. 

Next we turn to the dualization of risk measures and derive results that we 
can apply to our deviation-based superquantile regression problem DSqR. By the 
Envelope Theorem in Rockafellar and Uryasev (2013), an alternative formula for a 
positively homogeneous regular risk measure TZ{-) is given by its dnal representation, 
described as follows 

n{Y) = sup{E[YQ]}, (III.4) 

QeQ 

where Q is a nonempty closed convex set that is to the risk envelope associated with 
TZ. For Y E and a G (0,1), a that maximizes 

sup{E [YQ]} 

Q&Q 

is called a risk identifier. If is a risk identiher, then obviously 

n{Y) = E[YQ^]. (III.5) 

Clearly, when we have a risk measnre TZiY) = we get Q = 1. And for TZiY) = 

snpF, we obtain Q E {Q E C‘^ \ Q > 0,E[Q] = 1}. 

For the general treatment of risk identihers, we refer to Rockafellar and Royset 
(2014c). We consider the case where hi is finite and P({a;}) > 0, for a; G 11, to avoid 

technical issues regarding measurability. We let 11/3(R) = {cu G H | R(ca) = qjsiY)}, for 



(3 G (0,1), and -Fy (y) denote the left-continuous point of the cumulative distribution 
function Fy- 

Below we derive a risk identiher formula for the superquantile at Y and prob¬ 
ability level G (0,1). 


Proposition III.2. (Rockafellar & Royset, 2014c) For {3 G (0,1) and Y G a risk 
identifier for qy{Y) is given by 





0 


V 


tfY{u)>qy{Y), 
zfY{u) = qy{Y), 
otherwise, 


with 0 < ryi^u) < 1/(1 — /d) for u E Q such that 

Fy{qy{Y))-fi 


leipiY) 


ry{u)dP{u) = 


1-/9 


(III.6) 


We now turn to the risk identiher for our choice of measure of risk in problem 
DSqR, the a-second-order superquantile. We interpret 0 times —oo as zero. Let 
be a risk identiher for fa (P). 


Proposition III.3 

distribution with v 
identifier of faiY), 


. (Rockafellar & Royset, 2014c) Suppose that Y has a discrete 
possible realizations. Then, for a G (0,1) and Y G C^, a risk 
is given by 
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( 


1 

1—a 


log 


1—a 

1-Fy{Y{uj)) 


1 

1—a 



1—a 

l-F-iYico)) 


+ 1 


tf a < Fy{Y{u)) = Fy{Y{u)) < 1 
if a < Fy{Y{uj)) < Fy{Y{uj)) = 1 


1 

1—a 



1—a 

l-F-iYico)) 


+ 1 




l-Fy(FH) 

^ Fy(Y(l,))-F-(Y(u)) 


log 


i-Fy(yH) 1 
l-Ff (y(a;)) J 


< 


1 

1—a 


FY(Y(u))-a 

Fy{Y{u^))-F-{Y{uj)) 


tf a < Fy{Y{u)) < Fy{Y{u)) 
tfFy{Y{u))<a<FY{Y{u)) = l 


1 

1—a 


FY{Y(u))-a 

Fy{Y{u^))-F-{Y{uj)) 


+ 


1-Fy{Y{u)) , 1-Fy(Y(u)) 1 

Fy{Y{u,))-F-{Y{u,)) 1-a J 


tfFy{Y{u))<a<FY(Y{u)) 
and Fy{Y{uj)) < Fy{Y{uj)) 


0 otherwise. 

In view of Theorem 4.13 in Rockafellar and Royset (2014c), and equations 
(IIL3) and (IIL4), we are now able to build a dual method to solve the Deviation- 
based Superquantile Regression Problem DSqR. 

Consider the risk identiher of ga(-^o(c)), as dehned in Proposition III.3, 

for a probability level a G (0,1). Then, according to equation (III.3), we have that 


D,(Zo(c)) = ^,(Zo(c))-E[Zo(c)] 

= ^«(Zo(c))-P;[Zo(c)] 

= i?[Zo(c)Qf°(^)] - E [Zo(c)]. (III.7) 


And we are able to dehne the objective function of this new problem as follows 

/(c) = i^Z„(c)raQfW(i)-(;ZZo(c)' (III.8) 

i=l j=l 

= IY - (c. I‘(Y>))) i E (»' - (c. ft(it)). 

^ i=l ^ j=l 

where Eo(c)*'*^ is the i*^-ordered value of Zq{c). The evaluation of the objective func¬ 
tion requires the computation of Qa°^^^■ According to Proposition III.3, this implies 
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sorting vector Zq(c) for a given c to obtain its cumulative distribution function and 
only then evaluate using the same sorting as for Zo(c)*^*^. A subgradient of 

/(c) is then easily computed as follows 

v/(c) = Eft + ; E. (111-9) 

i=l j=l 

with h maintaining the same ordering as in Zo(c)*'*^ used in (III.8). 

The Approximate Dualization-of-risk Superquantile Regression Problem is 
dehned as: 


Problem: 

D" : mm A (ZS(c)) = ^ E ^ E 

i=l j=l 

with Cq being obtained by setting Cq = ^^(^(((c^)), and given by Proposi¬ 

tion III.3. 

We now turn to the implementation of these results. In the next subsections we 
present three algorithms that are well known. First we start with a simple algorithm, 
the subgradient method, and then move to an heuristic algorithm, the coordinate 
descent method, and hnish off with the cutting plane method. There are obviously 
many other possible algorithms we could implement when solving the dual methods, 
but we omit such investigation and only discuss these three as examples. 

2. Subgradient Method 

The subgradient method was originally developed by Naum Z. Shor and oth¬ 
ers in the 1960s and 1970s; see Shor (1985). It is a simple algorithm that can be 
implemented for solving a wide variety of problems, such as the minimization of 
nondifferentiable convex functions. 

The subgradient method is an iterative algorithm that aims to minimize a 
convex function /, by iteratively obtaining a new according to the following 
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scheme 


c^+i = c^-(5^V/(c^), (III.IO) 

where V/(c^) is any subgradient of / evaluated at c^, and is the stepsize used in 
iteration k. As a downside, this algorithm is not a descent method and it is possible 
to obtain increased objective function values in any iteration, therefore we need to 
store the best obtained objective function value by setting = min /fewest}- 

In fact, if we obtain the best function value so far in iteration k, we also need to store 
This way we guarantee to have = min |/(c^), /(c^),..., /(c^)} stored 
for later use. 

There are obviously many rules to dehne the stepsize used in algorithm SM, 
as we describe below. For example, one could use a step with constant length instead, 
= (^/llV/(c^)|| 2 , so that — c ^||2 = 5, or perhaps a diminishing stepsize, such 

as 5k = 7i/(/^ + 72 ), with 71 and 72 being some positive scalars. The importance of 
the right choice of stepsize 5^ becomes more aparent when we discuss computational 
performances, later in Section III.C. 

We now formally describe the subgradient method. 


Algorithm SM: 

Step 0. Choose an initial guess c° G IR^. Set k := 0. 

Initialize := 00 , and := 0. 

Step 1. Compute f{c^) and V/(c^), using Equations (III.8) and 
(III.9), respectively. 

If V/(c^) = 0, then is an optimal solution, stop. 

Step 2. Set = min /(c^)}, and let = k ii /(c^) = 

Step 3. Choose stepsize 5^, with > 0. 

Step 4. Dehne — 5^Vf{c^). 

Replace khy k + 1 and go to Step 1. 
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3. Coordinate Descent Method 

The coordinate descent method is an heuristic algorithm that is simple to 
implement. In this method, the objective function is minimized along one coordinate 
direction per iteration and a cycle is complete when all coordinates have been utilized 
in this process. Although we could dehne any permutation of coordinates as the order 
for the coordinate search, we will use the cyclical order for simplihcation. We beneht 
from the possibility of computing the subgradient of the objective function, as dehned 
in (III.9), to perform line search in each coordinate direction. 

We now formally describe the coordinate descent method. 

Algorithm CDM: 

Step 0. Start with an initial guess c° G J?™'. 

Set the cycle counter k ■.= 1. 

Step 1. Choose coordinate 1 and compute c\ G argmin^^ /(ci, C 2 ^, ..., ^). 

Step 2. Choose coordinate 2 and compute C 2 G argmin^^ /(ci, C 2 , ..., ^). 

Step m. Choose coordinate m and compute G argmin^^ /(ci, C 2 , ..., c^^)- 
Replace cycle khy k + 1 and go to Step 1. 

This algorithm terminates according to the threshold tolerance e > 0, inputted 
by the decision maker. For simplicity, we use the formula — f{c^) < e as our 

stopping criteria. 

4. Cutting Plane Method 

We hnish Section III.B by describing the third algorithm we implement in the 
numerical examples, in Chapter IV: the cutting plane method, which is guaranteed 
to achieve an optimal solution if one exists. 

The idea behind this algorithm is to solve an approximate linear program 
each iteration. The cutting plane method starts off with our original unconstrained 
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problem and with every iteration we obtain a cut to the feasible region that we add 
as a new constraint for the following linear program. So it approximates the feasible 
region by a hnite set of closed half spaces and solves a sequence of approximating 
linear programs until the optimal solution is found. As we notice, the size of the 
linear program grows with the number of iterations and becomes rather slow for a 
larger number of variables. 

The cutting plane method is usually used in integer or mixed integer linear 
programming problems but is also very popular when applied to convex minimiza¬ 
tion problems whenever the objective function value and its subgradient are easily 
computed, as we describe in detail below. Consider our minimization problem 


min 

cei?™ 


/(c) 




Using V/(c°), see Equation (III.9), at an initial guess c° G J?™, we are able to build 
a relaxation to our problem, as follows 


min f 

be 

s.t. /(c°) + V/(c°)^(c-c°) < e, (in.ll) 

with ^ & M being a dummy variable. If we keep adding a new constraint per it¬ 
eration k, as in (III.11), but now applied to the obtained optimal solution we 
construct the linear programming problem with K constraints, where K denotes the 
total number of iterations. 


min f 

s.t. /(c^) + V/(c^)^(c-c^) < e, A: = 0,...,if. 


We now formally state the cutting plane method. 
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Algorithm CPM: 


Step 0. Start with an initial guess c° G Set k := 0. 

Step 1. Compute /(c^) and Vf{c^), using Equations (III.8) and 
(III.9), respectively. 

If V f{c^) = 0, then is an optimal solution, stop. 

Step 2. Solve the Linear Program 

min ^ 

Lc 

s.t. /(c*) + V/(c*)^(c-C) < ^, i = 0,...,k. 
Step 3. Get obtained optimal solution c from Step 2 and set = c. 
Replace k hy k + 1 and go to Step 1. 


In the next section, we compare computational performances of the algorithms 
we present in CHapters III.A and III.B. We also compare these complexity results 
with least squares and quantile regression in order to understand how good these 
presented computational methods are. 

C. COMPLEXITY 

In the previous two sections we present different computational methods for 
the superquantile regression problem. When implementing these methods, it is useful 
to know how efficient and costly they are. In this section, we compare primal ver¬ 
sus dual methods in terms of worst-case complexity, and analyze the computational 
performances of least squares and quantile regressions. 

1. Least Squares Regression 

In the case of least squares regression we have a system with u linear equa¬ 
tions, due to the u observations in the data set, and m -|- 1 unknown coefficients, 
(co. Cl,..., Cm)- We let X be a design matrix of dimension u hy {m + 1), with all 
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elements in the first column being set equal to 1 in order for us to be able to include 
the intercept cq in the regression model. 

Then the best htting coefficients (cq, c) are the ones obtained by solving the 
quadratic minimization problem 


min > 

(co,c)ei?™+i 


y - Co 


Ea- 

i=i 




and, in matrix notation, are equal to 




(co,c) = (x^xy^x^y. 


(III.12) 


In terms of computational cost this algorithm implies: multiplying X~^ by X, which 
takes 0{u{m + 1)^) arithmetic operations; multiplying X~^ by y, which takes another 
0{u{m + 1)) arithmetic operations; computing the LU factorization of {X~^X), which 
takes another 0{{m + 1)^) arithmetic operations; and hnally multiplying {X"^X)~^ 
by {X"' y), which takes 0{{m + 1)^). So overall the running time of this procedure is 
0{vny), assuming of course that v > m and X is a full rank matrix. 


2. Quantile Regression 

As discussed in Subsection II.A.3, the quantile regression function is obtained 
by minimizing the (scaled) Koenker-Bassett error measure (Koenker, 2005). This 
problem can be rewritten as a linear program as follows 


min q; M + (1 — a) ij u 

C0,C,'U,f 

s.t. Co + (c,/i(a;*)) + Mj - Ui 

= 

y\ 

Co 

G 

R 

c 

e 


u = (Ui, ...,Uu) 

G 

JR'' 

V = (ui, ... ,u,,) 

G 



where 1J denotes a transposed z/-dimensional vector of ones. This linear program has 
a total of 2z/ + m + 1 number of variables and v number of equality constraints. For 
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US to be able to proceed with the computational performance analysis, we need to 
transform the problem into standard form. Summarizing we then have 2(2z/ + m + 1) 
variables and v equality constraints. 

Solving this linear program by means of an interior point method takes 0((4z/+ 
2m + 2)^'^) operations to produce a solution. The path following algorithm is one 
of such interior point methods. Monteiro and Adler (1989) rehned the path follow¬ 
ing algorithm to converge in 0 (a/2(2z/ -|- m -|- 1) log(eo/e)) iterations by reducing the 
duality gap from cq to e, with O ((4z/ -|- 2m -|- 2)^) arithmetic operations per iteration. 

The quantile regression implementation takes a total of O log(eo/e)), as¬ 
suming that V is larger than m. Specialized algorithms (see for example Koenker, 
2005) improve on this solution approach, but further discussions are beyond the scope 
of this dissertation. 

3. Superquantile Regression — Primal Methods 

Let us start with the analytical integration presented in Subsection III.A.l. We 
determine the computational performance of this method when the resulting linear 
program is solved using an interior point method. 

In order to determine the computational performance of problem P^p, we 
need to transform Ppp into a standard form linear programming problem. After this 
transformation, we have 2 [{u — z/q,)(z/ + 1) + m + 1] + v variables and {v — z/q,)z/ -|- v 
equality constraints. Since = [z/a], with a being usually close to 1, z/q, is almost 
as big as the number of observations v in the data set. 

As done with the computational performance in the quantile regression case, 
we use the convergence results we hnd in Monteiro and Adler (1989). The primal 
method using analytical integration takes a total of O (z/’^ log(eo/e)). 

Let us now turn to the numerical integration method described in Subsection 
III.A.2. Problem is a linear program with m -|- p -|- 1 -|- z/(/i -|- 1) variables and 

2z/(/i-|-l) inequality constraints. After transforming ^ standard form linear 

program, we have 2(/iz/ + ij, + u + m + l) variables and (p -|- l)z/ equality constraints 
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in the primal method with numerical integration. Since the number of observations 
V and integration subintervals fi are both usually large numbers and we disregard 
the inputted weights in our complexity analysis, the implementation of the primal 
methods for superquantile regression takes a total of O log(eo/e)). 


4. Superquantile Regression — Dual Methods 

We now compare the computational performance of the dual methods. Since 
in the numerical examples we implement the subgradient method using a constant 
stepsize rule, we analyze the computational performance of this algorithm under this 
circumstance. 


Let (i(c°) = miuggR™ ||c° — c|| be the distance between the initial guess c° and 
the optimal solution c. And let {c^} be the sequence generated by the subgradient 
method, with the stepsize <5^ fixed at some positive constant S, with k G {l,2,...,iL}. 

Then, according to Proposition 6.3.3 in Bertsekas (2009), for any scalar e > 0, 
we have that 

_ 1 _ f 

min /(c^) < /(c) H-, (III.13) 

0<k<K'' ^ > 2 ^ ^ 


where K is given by 


K = 


d{d 


,0'i2 


(5e 


(III.14) 


with M being the upper bound on the norm of V/(c^) G df{c^),yk > 0. The number of 
iterations is independent of the number of variables in the problem. The most costly 
operation of this algorithm in our case is the computation of V/(c^) at any given 
iteration k. Since the vector Zq{c^) needs to be sorted in order to compute Vf{c^), 
as stated in equation (III.8), the subgradient method takes 0(z/logz/) operations per 
iteration. Note that by establishing S = e/u, we can obtain an e-optimal solution in 
0(l/e^) iterations. So the subgradient method takes a total of 0((l/e^)z/logz/). 

We note that we present the complexity result for the slowest of the described 
dual methods. Implementing the Nesterov’s optimal method (see Nesterov, 1983) 
improves the obtained result for a total of 0((l/e)z/log z/). 



These results show that dual methods are not much slower than solving for 
least squares regression and such a conclusion is promising for superquantile regres¬ 
sion. 

In the next chapter we present a series of numerical examples that allow us to 
compare runtimes of the various algorithms. 
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IV. 


NUMERICAL EXAMPLES 


In this chapter, we illustrate superquantile regression in several numerical ex¬ 
amples. We start with a simple example that allows us to compare computational 
methods in terms of runtimes, solution vectors and function values. Then we ap¬ 
ply superquantile regression to the well-known data sets, Engel data and Brownlee 
stack loss plant data, and compare the obtained superquantile regression models to 
least squares and quantile regression functions. In the fourth example, we apply su¬ 
perquantile regression to an investment analysis problem taken from a case study 
of the Portfolio Safeguard documentation (American Optimal Decisions, Inc., 2011). 
The hfth and sixth examples address military applications, the hrst concerning U.S. 
Navy helicopter pilots and the second Portuguese Navy submariners, and in both 
examples their mission employment. We then show an example that arises in uncer¬ 
tainty quantihcation of a rectangular cross section of a short structural column under 
uncertain yield stress and uncertain loads. Finally we revisit the first example in or¬ 
der to address the issue of superquantile tracking. We experiment different regression 
models. We compare the obtained solution vectors, coefficients of determination and 
adjusted coefficients of determinations, and implement Cook’s distances applied to 
superquantile regression. 

Computations are mostly carried out in Matlab version 7.14 on a 2.26 GHz 
laptop with 8.0 GB of RAM using Windows 7. However we implement both least 
squares and quantile regression in R programming language (R Development Core 
Team, 2008). Specihcally for solving the superquantile regression problem with a 
numerical integration scheme , we use Portfolio Safeguard in Matlab, by Amer¬ 
ican Optimal Decisions, Inc. (2011), with VAN as the optimization solver. Since we 
assume the subintervals are equally likely when solving for the primal method using 
numerical integration schemes, from now on we denote problem HnuiiT ^y P^tm 
stead, and assume tCj = l//i, for i = 0,1,..., /i. When solving the primal method with 
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analytical integration, P{^p, and the dual method, D'', we employ GAMS version 23.7 
with the CPLEX 12.3 solver. 

A. COMPUTATIONAL COST 

We start by considering a loss random variable 

F = Xi + Xae, 

where e is a standard normal random variable and X = (Xi,X 2 ) is uniformly dis¬ 
tributed on [—1,1] X [0,1], with e,Xi, and X2 independent. We consider a regression 
function of the form f{x) = Cq + CiXi + C 2 X 2 and set a = 0.90. This simple example 
serves as a comparison tool between computational methods and their performances, 
as well as the obtained approximate solution vectors, since we know the conditional 
superquantile, which in this case is (cq, ci, C2) = (0,1,1.755). The detailed explanation 
can be seen in Section IV.H. We give an initial guess (c5,C2) = (0,0) when imple¬ 
menting the dual methods, and Cq is consecutively computed utilizing the regression 
vector (ci,C 2 ) obtained by the implemented algorithm. 

We hrst examine the computational effort required to obtain an approximate 
regression vector. Table 1 shows computing times for solving problem P^p for increas¬ 
ingly larger sample sizes u obtained by independent draws from (e,Xi,X 2 ). While 
the results correspond to single instances of Ppp, the times vary little between two 
instances of the same size and the computing times are therefore representative. As 


V 

100 

200 

300 

400 

500 

600 

700 

800 

900 

1000 

1500 

2000 

Time 

0 

0 

2 

6 

17 

32 

45 

65 

163 

174 

996 

2,972 


Table 1. Example A: Computing times (sec.) for solving Ppp for increasingly larger 
sample sizes u. 


predicted from the theoretical results discussed at the end of Subsection III.A.l, the 
computing time grows rapidly as the sample size u increases. In addition to the 
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inconvenience of long computing times, memory requirements become problematic. 
Therefore solving Plp for large sample sizes is challenging, if not impossible, and we 
examine alternative approaches. 

Second, we consider the alternative primal method approach based on solving 
-^Num- While this approach introduces a numerical integration error. Proposition III.l 
ensures that the error is negligible for large /i. In fact, as we see next empirically, 
moderately large fi suffices for probability levels a close to one. Moreover, the sub¬ 
stantial reduction in problem size, as compared to that of Ppp, reduces computing 
times dramatically. 

Since g/ 3 (ZQ(c)) may be nonsmooth as a function of (3, standard numerical 
integration error bounds may not apply. However, since q^^ZQ^c)) is continuous and 
nondecreasing as a function of (3, the use of left-endpoint and right-endpoint numerical 
integration rules in P^^ provide lower and upper bounds on the optimal value of 
DSqR'^, respectively. Table 2 shows solution vectors (co,Ci,C 2 ) for /i = 100, fi = 
1000, and fi = 5000 integration subintervals, when we implement left-endpoint, right- 
endpoint, and Simpson’s numerical integration schemes, for sample sizes of u = 100, 
u = 1000, u = 10000, and u = 100000. 

For u = 100, we notice that the solutions are insensitive to the numerical 
integration rule as well as the subintervals /i specihc to the integration scheme. The 
obtained solutions are essentially identical to the regression vector obtained from Ppp; 
see Row 2 of Table 2. Here the superquantile coefficient of determination is Pq qq = 
0.5683 for all the presented cases, including Ppp, which also supports the fact that 
the numerical integration rule does not affect the obtained solution. Each one of the 
solutions of P^um ^ = 100 is obtained quickly, in about 0.08 to 8 seconds for 
fi = 100, /i = 1000, and fi = 5000; see the last column of Table 2. In this case, we 
clearly notice that fi = 100 suffices and takes less than a tenth of a second to run. 

When we increase the sample size u, we start to notice that the solution vectors 
are slightly different but the magnitudes of these differences are small for subintervals 
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Problem 

Integration Rule 

V 

h 

Co 

Cl 

C2 

Time 

pu 

-'^LP 

NA 

100 

NA 

0.0630 

1.0951 

1.5841 

0.05 


Left Endpoint 

100 

100 

0.0630 

1.0951 

1.5841 

0.07 


Right Endpoint 

100 

100 

0.0630 

1.0951 

1.5841 

0.08 


Simpson’s 

100 

100 

0.0630 

1.0951 

1.5841 

0.09 


Left Endpoint 

100 

1000 

0.0630 

1.0951 

1.5841 

0.79 

^ Num 

Right Endpoint 

100 

1000 

0.0630 

1.0951 

1.5841 

0.83 


Simpson’s 

100 

1000 

0.0630 

1.0951 

1.5841 

0.77 


Left Endpoint 

100 

5000 

0.0630 

1.0951 

1.5841 

7.81 


Right Endpoint 

100 

5000 

0.0630 

1.0951 

1.5841 

7.24 


Simpson’s 

100 

5000 

0.0630 

1.0951 

1.5841 

7.10 

pu 

-'^LP 

NA 

1000 

NA 

0.0680 

1.0108 

1.7322 

174.24 


Left Endpoint 

1000 

100 

0.0689 

1.0112 

1.7290 

0.12 


Right Endpoint 

1000 

100 

0.0658 

1.0099 

1.7398 

0.13 


Simpson’s 

1000 

100 

0.0680 

1.0108 

1.7322 

0.13 


Left Endpoint 

1000 

1000 

0.0683 

1.0112 

1.7310 

1.29 

■^Num 

Right Endpoint 

1000 

1000 

0.0678 

1.0106 

1.7327 

1.26 


Simpson’s 

1000 

1000 

0.0680 

1.0108 

1.7322 

1.21 


Left Endpoint 

1000 

5000 

0.0680 

1.0109 

1.7321 

10.91 


Right Endpoint 

1000 

5000 

0.0680 

1.0108 

1.7322 

11.44 


Simpson’s 

1000 

5000 

0.0680 

1.0108 

1.7322 

9.76 


Left Endpoint 

10000 

100 

0.0835 

1.0049 

1.6374 

0.58 


Right Endpoint 

10000 

100 

0.0799 

1.0050 

1.6492 

0.56 


Simpson’s 

10000 

100 

0.0818 

1.0048 

1.6429 

0.56 


Left Endpoint 

10000 

1000 

0.0820 

1.0048 

1.6423 

5.91 

■^Num 

Right Endpoint 

10000 

1000 

0.0816 

1.0048 

1.6435 

5.00 


Simpson’s 

10000 

1000 

0.0818 

1.0048 

1.6430 

5.27 


Left Endpoint 

10000 

5000 

0.0818 

1.0048 

1.6428 

28.93 


Right Endpoint 

10000 

5000 

0.0817 

1.0048 

1.6431 

32.72 


Simpson’s 

10000 

5000 

0.0818 

1.0048 

1.6429 

29.12 


Left Endpoint 

100000 

100 

0.8149 

0.2484 

1.1749 

5.05 


Right Endpoint 

100000 

100 

0.8242 

0.2411 

1.1572 

4.55 


Simpson’s 

100000 

100 

0.8176 

0.2454 

1.1702 

3.98 


Left Endpoint 

100000 

1000 

0.8152 

0.2462 

1.1750 

46.00 

^ Num 

Right Endpoint 

100000 

1000 

0.8162 

0.2454 

1.1732 

38.01 


Simpson’s 

100000 

1000 

0.8155 

0.2459 

1.1746 

46.07 


Left Endpoint 

100000 

5000 

0.8155 

0.2460 

1.1746 

307.34 


Right Endpoint 

100000 

5000 

0.8156 

0.2458 

1.1743 

330.99 


Simpson’s 

100000 

5000 

0.8156 

0.2459 

1.1744 

278.55 


Table 2. Example A: Solution vectors and computing times (sec.) for varying number 
of observations u, integration rules for solving as well as number of integration 
subintervals fi. 
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of /i = 100 and /i = 1000. For numerical integration scheme implementations with 
/i = 5000, these differences are almost inexistent, but the computing times are larger. 
Therefore the statistician should take this into consideration when selecting the num¬ 
ber of subintervals for the numerical integration scheme. It is a tradeoff between 
obtaining better solutions versus computing times. Also we notice there is another 
issue we encounter when solving superquantile regression problems for sample sizes 
as large as 100000 observations. In Rows 31-39 of Table 2, we intentionally include 
the solution vectors for v = 100000 using the same number of subintervals as imple¬ 
mented in the other cases. The discrepancies in solution vectors are consequence of 
rounding errors and we refer to Borges (2011) for further details. 

One detail that is not included in Table 2 is the coefficient of determination 
Rggg. For ull tho presented cases, the coefficient of determination takes the values of 
0.4222, 0.3917, and 0.1029, for sample sizes v = 1000, v = 10000, and v = 100000, 
respectively. We notice that Rg gg decreases as we increase the size of the data sample, 
which means that the linear model f{x) = Cq + CiXi + C 2 X 2 does not fully capture the 
variability of the data, as expected, and a study of other models may be warranted. 
However, we omit such an investigation. 

As discussed in Chapter III, the dual method is another approach to solve the 
deviation-based superquantile regression problem which theoretically demonstrates 
potential for large sample sizes. Since numerical integration not only introduces a 
numerical integration error but also takes increasingly longer to run for increasing 
sample sizes, we proceed with the implementation of the dual methods. 

Third, we solve the superquantile regression problem by implementing the dual 
methods of Section B in Chapter III, i.e., subgradient, coordinate descent, and cutting 
plane methods. Since dehning the stepsize and tolerance for the three algorithms, as 
well as the maximum number of iterations in the specihc case of the subgradient 
method, can be a difficult process, we establish the following input parameters for 
each algorithm as a natural choice for us to be able to compare all three methods. We 
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note that refining these parameters as well as implementing more efficient algorithms 
could return even better computing times but such an investigation is not the purpose 
here. Our goal with this example is to demonstrate the potential for dual methods. 
For the subgradient method, we fix the stepsize to a constant value, 5 = 0.1, and run 
the algorithm for 1000 iterations. In the case of the coordinate descent method, we 
include a tolerance of 10“^^ and define 1000 as the maximum number of iterations. 
We implement the cutting plane method with a maximum of 1000 cuts, and a gap of 
10“®. 

Table 3 shows the computing times needed for solving problem D'' for increas¬ 
ingly large sample sizes u implementing these algorithms. Here the computing times 
are also representative, for the same reason as in Table 1. As expected, the subgradi- 


V 

Computing Times 

Subgradient 

Coordinate Descent 

Cutting Plane 

100 

0.13 

0.67 

0.91 

1000 

0.34 

1.20 

2.07 

5000 

1.43 

0.70 

0.95 

10000 

2.88 

0.88 

3.00 

25000 

5.96 

3.24 

2.20 

50000 

11.92 

8.30 

1.73 

75000 

21.53 

13.78 

1.98 

100000 

27.38 

19.20 

1.78 


Table 3. Example A: Computing times (sec.) for solving D'^ using different imple¬ 
mentations of the dual methods for increasing sample sizes u. 


ent method is the slowest of the three algorithms for almost all the presented cases, 
especially for sample sizes greater than 1000 observations. In all the described dual 
methods and empirically, the computing times grow linearly with the sample size u, 
with the cutting plane method having the smallest slope of the three, as shown in 
Figure 6. 

In Figure 7, we picture the computing times for primal versus dual methods, 
in logarithmic scale. Here we choose to present the Simpson’s integration rule with 
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Figure 6. Example A: Computing times for solving D'^ with three different algorithms 
(subgradient, coordinate descent, and cutting plane methods), for increasing sample 
sizes u. 

/i = 1000 subintervals and the dual methods algorithms with the input parameters 
as stated before. We clearly notice that implementing the cutting plane method 
improves the computational performance, especially for large sample sizes. Also, for 
larger samples sizes and smaller probability levels a, we certainly need to increase 
the number of integration subintervals /r. 

We also compare the obtained solution vectors and corresponding objective 
function values; see Table 4. Again we note that it is not possible to solve P^p for 
sample sizes larger than z/ = 1000. We use Simpson’s rule with /x = 1000 intervals as 
the numerical integration scheme for all sample sizes. We realize that the obtained 
solution vectors are nearly identical. 

Finally we analyze how changing the probability level a and the number of 
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Figure 7. Example A: Primal versus dual methods computing times for increasing 
sample sizes u, in logarithmic scale. 

observations u affect the computational performance of the implemented algorithms. 
Obviously, the primal methods are affected by changes in the probability level a since 
the integral in problem DSqR^ is dehned between a and 1. The smaller the value of a, 
the smaller z/„ = gets, and consequently the number of variables and inequality 
constraints in problem Plp increases due to the increased difference (z/ — z/q). In 
the numerical integration schemes, the smaller a gets, the more subintervals jj, are 
required to obtain same accuracy. 

As we can observe in Table 5, the sample size z/ influences the computing times 
of the subgradient method, but the probability level a does not produce such an effect. 
We note that the computing times for the sample sizes v = 100 and v = 1000 are not 
exactly the same for all the presented probability levels a. These values differ in the 
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Computational 

Method 

V 

h 

Co 

Cl 

C2 

Function 

Value 

Time 

Analytical Int. 

100 

NA 

0.0630 

1.0951 

1.5841 

0.844477 

0.05 

Numerical Int. 

100 

1000 

0.0630 

1.0951 

1.5841 

0.844477 

0.77 

Subgradient 

100 

NA 

0.0673 

1.1004 

1.5699 

0.844575 

0.13 

Coord. Descent 

100 

NA 

0.0631 

1.0952 

1.5839 

0.844478 

0.67 

Cutting Plane 

100 

NA 

0.0630 

1.0951 

1.5841 

0.844477 

0.91 

Analytical Int. 

1000 

NA 

0.0680 

1.0108 

1.7322 

1.049276 

174.24 

Numerical Int. 

1000 

1000 

0.0680 

1.0108 

1.7322 

1.049276 

1.21 

Subgradient 

1000 

NA 

0.0680 

1.0109 

1.7321 

1.049276 

0.34 

Coord. Descent 

1000 

NA 

0.0680 

1.0109 

1.7321 

1.049276 

1.20 

Cutting Plane 

1000 

NA 

0.0680 

1.0109 

1.7320 

1.049276 

2.07 

Analytical Int. 

10000 

— 

— 

— 

— 

— 

— 

Numerical Int. 

10000 

1000 

0.0818 

1.0048 

1.6430 

1.092066 

5.27 

Subgradient 

10000 

NA 

0.0818 

1.0048 

1.6429 

1.092033 

2.88 

Coord. Descent 

10000 

NA 

0.0834 

1.0047 

1.6378 

1.092040 

0.88 

Cutting Plane 

10000 

NA 

0.0817 

1.0049 

1.6432 

1.092033 

3.00 


Table 4. Example A: Solution vectors and computing times (sec.) for the superquan¬ 
tile regression problem with varying computational methods, and sample sizes v. 


Dual 

Method 

a 

V 

Co 

Cl 

C2 

Function 

Value 

Time 



100 

-0.0478 

1.1038 

0.6420 

0.502796 

0.14 


0.25 

1000 

-0.0557 

0.9988 

0.6726 

0.584710 

0.33 



10000 

-0.0398 

0.9990 

0.6048 

0.608587 

2.61 

Subgradient 


100 

0.0163 

1.0901 

0.8440 

0.598695 

0.14 


0.50 

1000 

0.0309 

0.9943 

0.8822 

0.705208 

0.33 

Method 


10000 

0.0390 

1.0014 

0.8239 

0.732942 

2.99 



100 

0.0390 

1.1029 

1.2056 

0.729180 

0.14 


0.75 

1000 

0.0762 

0.9976 

1.2239 

0.875049 

0.33 



10000 

0.0742 

1.0009 

1.2050 

0.905114 

2.93 


Table 5. Example A: Solution vectors and computing times (sec.) for solving 
when implementing the subgradient method with varying probability levels a and 
number of observations v. 
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third decimal places, which makes the magnitude of such differences negligible. 


Table 6 presents the solution vectors and computing times for the coordinate 
descent method for different probability levels a and sample sizes u. Similarly to the 
subgradient method, we realize that only the sample size u has a signihcant effect on 
the computing times. 


Dual 

Method 

a 

V 

Co 

Cl 

C2 

Function 

Value 

Time 



100 

-0.0478 

1.1038 

0.6419 

0.502796 

0.62 


0.25 

1000 

-0.0227 

0.9995 

0.5937 

0.585203 

0.16 

Coordinate 


10000 

-0.0392 

0.9990 

0.6034 

0.608588 

2.51 



100 

0.0163 

1.0901 

0.8439 

0.598695 

0.65 

Descent 

0.50 

1000 

0.0340 

0.9943 

0.8734 

0.705218 

0.21 



10000 

0.0403 

1.0014 

0.8205 

0.732944 

1.58 

Method 


100 

0.0390 

1.1029 

1.2056 

0.729180 

0.70 


0.75 

1000 

0.0771 

0.9977 

1.2213 

0.875050 

0.21 



10000 

0.0763 

1.0009 

1.1987 

0.905121 

1.03 


Table 6. Example A: Solution vectors and computing times (sec.) for solving D‘' 
when implementing the coordinate descent method with varying probability levels a 
and number of observations u. 


Dual 

Method 

a 

V 

Co 

Cl 

C2 

Function 

Value 

Time 



100 

-0.0479 

1.1038 

0.6420 

0.502797 

1.13 


0.25 

1000 

-0.0556 

0.9989 

0.6723 

0.584710 

1.51 

Cutting 


10000 

-0.0399 

0.9989 

0.6049 

0.608587 

2.90 



100 

0.0162 

1.0901 

0.8440 

0.598695 

2.23 

Plane 

0.50 

1000 

0.0307 

0.9944 

0.8827 

0.705208 

1.21 



10000 

0.0389 

1.0017 

0.8243 

0.732943 

1.22 

Method 


100 

0.0390 

1.1029 

1.2056 

0.729180 

0.80 


0.75 

1000 

0.0763 

0.9976 

1.2238 

0.875049 

1.97 



10000 

0.0739 

1.0008 

1.2059 

0.905114 

1.16 


Table 7. Example A: Solution vectors and computing times (sec.) for solving D‘' 
when implementing the cutting plane method with varying probability levels a and 
number of observations u. 
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A different result is obtained when we implement the cutting plane method, 
as shown in Table 7. The computing time differences are not signihcant in any of the 
cases. Here we run the cutting plane method with bounds on the vectors {cf, C 2 ), for 
iteration k. Decreasing these bounds by making them more restrictive, and reducing 
the maximum number of cuts that the algorithm can add, reduces the computing 
times shown by Column 8 in Table 7, and the magnitudes of the computing times 
differences become even less signihcant. 

Out of curiosity, if we implement the subgradient method for this example with 
ten times more iterations, i.e., a total of 10000 iterations, and reduce the stepsize to 
6 = 0.01, we hnd that the solution vectors are exactly the same as the ones presented 
in Table 5, with the same objective function values, but the computing times increase 
by at least a factor of 10 in the cases of u = 10000 and u = 100000, a factor of 18 for 
u = 1000, and a factor of 30 for u = 100. This shows how important the selection of 
the right stepsize and maximum number of iterations is. 

From this example we conclude that for small sample sizes it is benehcial to 
run the primal method using analytical integration, since we obtain the exact solution 
vector and the computing times are not drastically higher than solving D''. As the 
sample size increases, the results show that we should rely on the dual method and 
implement the cutting plane method or any other algorithm that is comparable to 
the cutting plane method. Another aspect we observe is the fact that the probability 
level a does not produce any visible effect on the dual methods computing times. To 
the contrary, the primal methods, with analytical or numerical integration schemes, 
are clearly affected due to changes in a since the integral interval is adjusted accord¬ 
ingly and the number of variables changes. Implementing the primal methods with 
numerical integration schemes implies the wise selection of the number of subintervals 
/i according to the sample size u and probability level a. 
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B. ENGEL DATA 

This next example is based on a data set originally worked by Ernst Engel in 
1857, and used by Koenker and Bassett in their regression quantiles studies (Koenker 
& Bassett Jr., 1982). Engel presents this data set to show his hypothesis that the 
annual expenditures on food for Belgian working class families increase less than the 
increase of their annual household incomes. In Koenker (2005), the author uses this 
data set as an example to address the issues of estimating the asymptotic covariance 
matrix in statistical inference for quantile regression and estimates six quantile re¬ 
gression functions for probability levels a G {0.05, 0.10,0.25, 0.75, 0.90, 0.95}. Eor this 
example, we are interested in comparing these obtained quantile regression functions 
with superquantile regression functions at the same probability levels a, and verify 
how both regression techniques differ conceptually. 

We have a data set of 235 observations of the income and the expenditure on 
food in Belgian francs for Belgian working class annual households, see Figure 8(a). 
As done in Koenker (2005), we also transform both variables to the logarithmic scale, 
see Figure 8(b). We seek to quantify the food expenditure Y and consider a linear 
regression function fi{X) = cq J- cX, where X is the explanatory random variable 
that represents the household income for Belgian working class. 

In Figure 9(b), we observe the a-quantile regression models, for probability 
levels a G {0.05,0.10,0.25,0.50,0.75,0.90,0.95}, in logarithmic scale. Here we also 
include the least squares and the 0.50-quantile regression functions for comparison 
and highlight the obtained 0.75-quantile regression function that we use later in this 
example. Although some of these quantile regression functions look parallel, their 
slopes are distinct; see Koenker (2005) for further discussion. These slope differences 
are more evident in Figure 9(a). 

In Figure 10, we present the a-superquantile regression models, for different 
probability levels a G {0.05,0.10,0.25,0.50,0.75,0.90,0.95}. Again we include the 
least squares and the 0.50-superquantile regression functions and highlight the ob- 
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(a) Original display. 
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(b) Logarithmic scale display. 

Figure 8. Example B: Engel data set 







(a) Original display. 



(b) Logarithmic scale display. 


Figure 9. Example B: Least squares and quantile regression functions, for varying a. 
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tained 0.75-superquantile regression fit. 

An interesting detail shown in Figure 9 is that the obtained quantile regression 
models nearly “span” the observations, i.e., we have regression functions above and 
below the least squares regression £t. As we observe in Figure 10, the superquan¬ 
tile regression models for varying probability levels a do not have such property. 
One would have to change the orientation of the original problem in order to obtain 
regression functions below the least squares regression model, since ^o(^) = 

In order to compare the obtained regression vectors and the corresponding 
coefficient of determination for the model fi{x) = cq 4- cix, we refer to Table 8. We 
cosider the same probability levels a as shown in Figures 9 and 10. Due to the small 
sample size, 235 observations, we solve the deviation-based superquantile regression 
problem by analytical integration, P^p. We refer to Figure 11(a) to show how close the 
quantile and superquantile linear regression functions are in the case where a = 0.75. 

We now consider a quadratic model of the form f 2 {x) = Cq + Cix -|- C 2 x‘^. 
In Fignre 11(b), we observe the different quadratic fits for least squares, quantile, 
and snperqnantile regressions. Althongh both graphs in Figure 11 show that the 
0.75-superquantile regression fnnctions look exactly alike. Figure 11(b) actually has 
a curvature that can be noted using different scales on the horizontal axis. Table 
8 shows the obtained regression vectors (co,Ci,C 2 ) for the qnadratic model, using 
distinct regression techniques. We note that the coefficient of determination for the 
linear are slightly smaller than for the quadratic models, which means that adding 
the term C 2 x‘^ slightly improves the obtained results in some sense. 

In Figure 12, we visualize quantile and snperqnantile regression fnnctions for 
varying probability levels a. It is interesting to notice how the qnantile regression fits 
are severely influenced by one observation where four quantile regression functions 
cross each other just below the least squares fit, represented by the big black dot. To 
the contrary, the obtained superquantile regression fits are not greatly influenced by 
this observation and depict other observations. 
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Food Expenditure Food Expenditure 



(a) Original display. 



(b) Logarithmic scale display. 


Figure 10. Example B: Least squares and superquantile regression functions, for 
varying a. 








Regression Model 

a 

Co 

Cl 

C2 

Rl 

Least Squares 

NA 

147.475 

0.4852 

— 

0.8304 


0.05 

124.880 

0.3434 

— 

— 


0.10 

110.142 

0.4018 

— 

— 


0.25 

95.4835 

0.4741 

— 

— 

Quantile 

0.50 

81.4822 

0.5602 

— 

— 


0.75 

62.3966 

0.6440 

— 

— 


0.90 

67.3509 

0.6863 

— 

— 


0.95 

64.1040 

0.7091 

— 

— 


0.05 

18.8791 

0.6370 

— 

0.6882 


0.10 

27.0860 

0.6387 

— 

0.6913 


0.25 

45.2404 

0.6425 

— 

0.7043 

Superquantile 

0.50 

52.3684 

0.6657 

— 

0.7322 


0.75 

57.3732 

0.6924 

— 

0.7716 


0.90 

77.4796 

0.7039 

— 

0.8070 


0.95 

88.6620 

0.7097 

— 

0.8223 

Least Squares 

NA 

8.0060 

0.7100 

-6.603e-5 

0.8671 


0.05 

-31.7001 

0.6815 

-1.295e-4 

— 


0.10 

52.6260 

0.5009 

-2.884e-5 

— 


0.25 

22.8226 

0.6123 

-5.009e-5 

— 

Quantile 

0.50 

5.7593 

0.7243 

-7.198e-5 

— 


0.75 

-26.0488 

0.8378 

-9.360e-5 

— 


0.90 

72.2423 

0.6724 

7.838e-6 

— 


0.95 

44.3764 

0.7445 

-1.419e-5 

— 


0.05 

-28.7584 

0.7354 

-4.243e-5 

0.6903 


0.10 

-13.3480 

0.7212 

-3.498e-5 

0.6928 


0.25 

17.2230 

0.6946 

-1.896e-5 

0.7050 

Superquantile 

0.50 

32.8155 

0.7034 

-1.439e-5 

0.7327 


0.75 

45.6962 

0.7144 

-8.130e-6 

0.7717 


0.90 

54.6966 

0.7461 

-1.467e-5 

0.8079 


0.95 

53.0274 

0.7777 

-2.522e-5 

0.8241 


Table 8. Example B: Solution vectors (co,ci) and coefficients of determination for 
the linear model of the form fi{x) = cq + cix, and solution vectors (co,ci,C 2 ) and 
coefficients of determination for the quadratic model of the form f 2 {x) = cq + cix + 
C2x'^. 
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Food Expenditure 



- 0.75-Superquantile Regression 

- 0.75-Quantile Regression 

- Least Squares Regression 

I I I 

3000 4000 5000 


Household Income 


(a) Linear model fi{x) = cq + cix. 



(b) Quadratic model f2ix) = cq + cix + C2x‘^. 

Figure 11. Example B: Regression functions for linear and quadratic models. 









Food Expenditure Food Expenditure 



(a) Quantile regression for varying probability levels a. 



(b) Superquantile regression for varying probability levels a. 


Figure 12. Example B: Least squares, quantile, and superquantile regression functions 
for the quadratic model = cq + c\x + C 2 X^. 








As a conclusion to this example, we note that superquantile regression brings 
additional information concerning the tail realizations of our loss random variable. 
The linear hts from quantile and superquantile regressions are close, with only a slight 
difference in slope. However, the quadratic superquantile model provides a distinct 
perspective. In the quadratic case, quantile regression is highly affected in a dubious 
manner by one observation. 

C. BROWNLEE STACK LOSS PLANT DATA 

This example is based on a data set with 21 observations from the Brownlee 
stack loss plant data set, which dehnes the oxidation of ammonia (NH3) to nitric acid 
(HN03) of a plant, as described in detail in Brownlee (1965). 

We seek to estimate the stack loss random variable Y, representing ten times 
the percentage of ammonia going into the plant that escapes from the absorption 
tower, using three explanatory random variables: air flow (Waf), which represents 
the rate of operation of the plant; water temperature (W^t), which denotes the tem¬ 
perature of cooling water circulated through coils in the absorption tower; and acid 
concentration (Xac), [per 1000, minus 500]. 

Figure 13 shows the scatterplot matrix of the stack loss data, where we observe 
the pairwise correlations. Here we notice a linear correlation between stack loss and 
air flow and a polynomial correlation between stack loss and water temperature. We 
explore these two possible models and compare the obtained results with coefficient 
of determination calculations, as described in Section H.C. 

We hrst consider a linear model of the form f\{x) = Co + CafXaf + Cwta^wt + Caca^ac- 
Table 9 shows the obtained regression coefficients after solving Plp- the instances 
of problem P^p take approximately one quarter of a second to run due to the small 
number of observations in the data sample. 

From Table 9, we conclude that a linear model with all three explanatory ran¬ 
dom variables is reasonable. It is interesting to note that the resulting coefficients of 
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Figure 13. Example C: Stack loss data scatterplot matrix. 
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Regression 

a 

Co 

Caf 

Cwt 

Cac 

Rl 

p2 

-^a,Adi 

Least Squares 

NA 

-39.9197 

0.7156 

1.2953 

-0.1521 

0.9136 

0.8983 


0.25 

-36.0000 

0.5000 

1.0000 

0.0000 

— 

— 

Quantile 

0.50 

-39.6899 

0.8319 

0.5739 

-0.0609 

— 

— 

0.75 

-54.1897 

0.8707 

0.9828 

0.0000 

— 

— 


0.90 

-58.5433 

0.7930 

1.3054 

0.0382 

— 

— 


0.25 

-55.1432 

0.8056 

1.2037 

0.0000 

0.7478 

0.7033 

Superquantile 

0.50 

-58.6210 

0.7930 

1.3054 

0.0382 

0.7750 

0.7353 

0.75 

-60.1368 

0.7500 

1.4561 

0.0570 

0.8050 

0.7706 


0.90 

-58.4620 

0.5246 

1.8584 

0.1073 

0.8231 

0.7919 


Table 9. Example C: Regression vectors, and -Ra.Adj linear model /i which 

includes all explanatory variables, and for different probability levels a. 


a 

0.05 

0.10 

0.15 

0.20 

0.25 

0.30 

0.35 

0.40 

0.45 

K 

0.7384 

0.7402 

0.7423 

0.7447 

0.7478 

0.7516 

0.7563 

0.7618 

0.7682 

a 

0.50 

0.55 

0.60 

0.65 

0.70 

0.75 

0.80 

0.85 

0.90 

K 

0.7750 

0.7818 

0.7883 

0.7944 

0.8001 

0.8050 

0.8110 

0.8173 

0.8231 


Table 10. Example C: Coefficients of determination for different probability levels a. 

determination and adjusted coefficients of determination superquantile 

regression increase with a, which lead us to further experiment for various probability 
levels a. Table 10 shows the obtained coefficients of determination for varying a. 

We next analyze a simpler model, using water temperature as the only available 
explanatory variable, and compare the corresponding linear f 2 {x) = Cq + c^t^wt and 
quadratic models fsi^x) = Co + Cwta^wt + Cwt2a^wtj Table 11. For the situation where 
one only has water temperature as the explanatory variable, applying the quadratic 
model /a slightly reduces the coefficients of determination. However we plot the 
obtained regression functions, see Figure 14(b), and notice that the quadratic models 
better represent the data. 

It is interesting to notice that the 0.90-quantile and 0.90-superquantile regres¬ 
sion functions are exactly the same in the quadratic model /a. This is due to a small 
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Model 

Regression 

a 

Co 

Cwt 

Cwt2 

Rl 

jd2 

^a.Ad] 


Least Squares 

NA 

-41.9109 

2.8174 

— 

0.7665 

0.7542 



0.25 

-32.0000 

2.1667 

— 

— 

— 


Quantile 

0.50 

-47.8571 

3.1429 

— 

— 

— 


0.75 

-41.0000 

2.8889 

— 

— 

— 

/2 


0.90 

-42.0000 

3.1111 

— 

— 

— 



0.25 

-43.6667 

3.0000 

— 

0.5649 

0.5420 


Superquantile 

0.50 

-41.7619 

3.0000 

— 

0.5954 

0.5741 


0.75 

-39.1905 

3.0000 

— 

0.6440 

0.6250 



0.90 

-38.0476 

3.0000 

— 

0.6715 

0.6540 


Least Squares 

NA 

151.5654 

-15.2555 

0.4131 

0.8755 

0.8617 



0.25 

148.6000 

-15.1583 

0.4083 

— 

— 


Quantile 

0.50 

200.8500 

-19.8333 

0.5167 

— 

— 


0.75 

110.1429 

-11.1381 

0.3191 

— 

— 

fs 


0.90 

205.5714 

-20.6714 

0.5571 

— 

— 



0.25 

167.5589 

-16.9167 

0.4583 

0.6676 

0.6306 


Superquantile 

0.50 

183.9524 

-18.5000 

0.5000 

0.6884 

0.6538 


0.75 

205.4789 

-20.6714 

0.5571 

0.7490 

0.7211 



0.90 

205.5714 

-20.6714 

0.5571 

0.7792 

0.7546 


Table 11. Example C: Regression vectors, and Ra^Adj linear and quadratic 
models, /2 and /s, respectively, for varying probability levels a. 


data set and how the observations are dispersed. For example, here we have three 
observations at sample point {x\y^) = (20,15). For such a very small data set, hav¬ 
ing coincident observations does not help obtaining better quantile or superquantile 
regression hts. We notice that the 0.75-quantile regression function is a clear example 
of how having small data sets aggravated by overlapping observations influences the 
obtained regression vector and may cause the function to shift accordingly. In this 
case, we realize that both 0.75-quantile and 0.75-superquantile regression functions 
cross the point = (20,15). 

As a conclusion to this example, we note that small data sets in superquantile 
regression are problematic to deal with. As a thumb rule, one needs 1/(1 — a) times 
more observations when performing superquantile regression than when in the case 
of least squares regression. Therefore the obtained approximating regression vectors 
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(a) Linear model f2ix) = cq + CwtXwt- 



(b) Quadratic model fsix) = cq + c^tXwt + Cwt2x'if 

Figure 14. Example C: Regression functions for linear and quadratic models. 
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for small data sets should be considered with care when used in decision making 
processes. 

D. INVESTMENT ANALYSIS 

The next example is a case study taken from the “Style Classihcation with 
Quantile Regression” documentation in Portfolio Safeguard, by American Optimal 
Decisions, Inc. (2011), and deals with the negative return of the Fidelity Magellan 
Fund as predicted by the explanatory variables Russell 1000 Growth Index (Xrlg); 
Russell 1000 Value Index (Xrlv); Russell Value Index (Xruj); and Russell 2000 
Growth Index (Xruo)- We change the orientation from “return” to “negative return” 
to be consistent with the orientation of a loss random variable in this dissertation. 
The indices classify the style of the fund; see American Optimal Decisions, Inc. (2011) 
for details. There are v = 1264 total observations available. 

We start by considering a linear model /i {x) = cq + CRLca^RLG + CRLva^RLV + 
cruj2^ruj + cruo^^ruo and compare the obtained approximate regression vectors for 
least squares, quantile, and superquantile regression models under a probability level 
a =0.75 and 0.90, as shown in Rows 2-6 of Table 12. DSqR' is solved through 
-^Num with Simpson’s rule as the integration scheme and p = 1000 subintervals, while 
quantile regression is carried out directly in Portfolio Safeguard Shell Environment 
(American Optimal Decisions, Inc., 2011). Table 12 (last column) also shows the cor¬ 
responding adjusted coefficients of determination. The hts are good and a majority of 
the variability in the data is captured. However, the small values of cruo and also the 
corresponding p-value from the least squares regression point to the possible merit 
of dropping Xrjjo as explanatory variable. We from now on focus on superquan- 
tile regression. A new model f 2 {x) = cq + crlo^^ulg + CRLva^RLV + crujXruj yields 
the approximate regression vectors of Table 12 (Rows 7-8), which also shows the ob¬ 
tained adjusted coefficients of determination R^^Ady The fact that we analyze Ra,Adj 
instead of R^ enable ns to better compare hts across models with different numbers 
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Model 

Regression 

a 

Co 

crlg 

crlv 

cruj 

cruo 

p2 

-^a,Adi 


LS 

NA 

0.0010 

-0.5089 

-0.5180 

0.0484 

0.0061 

0.9823 


Quantile 

0.75 

0.0045 

-0.5438 

-0.4518 

0.0159 

0.0173 

— 

/i 

0.90 

0.0089 

-0.5177 

-0.4602 

0.0156 

-0.0001 

— 


Super¬ 

0.75 

0.0095 

-0.5036 

-0.4723 

0.0192 

0.0009 

0.8731 


quantile 

0.90 

0.0138 

-0.4837 

-0.4912 

0.0223 

-0.0019 

0.8718 

/2 

Super¬ 

0.75 

0.0095 

-0.5028 

-0.4728 

0.0200 

— 

0.8733 

quantile 

0.90 

0.0138 

-0.4855 

-0.4906 

0.0210 

— 

0.8720 



0.75 

0.0137 

-0.8228 

— 

— 

— 

0.7380 



0.90 

0.0218 

-0.8189 

— 

— 

— 

0.7248 



0.75 

0.0321 

— 

-1.0668 

— 

— 

0.5940 

/a 

Super¬ 

0.90 

0.0475 

— 

-1.0727 

— 

— 

0.5702 

quantile 

0.75 

0.0515 

— 

— 

-0.7745 

— 

0.4103 



0.90 

0.0714 

— 

— 

-0.6949 

— 

0.4162 



0.75 

0.0344 

— 

— 

— 

-0.5498 

0.3962 



0.90 

0.0512 

— 

— 

— 

-0.5145 

0.2593 


Table 12. Example D: Approximate least squares (LS), quantile, and superquantile 
regression vectors and Ra,Adj models /i, / 2 , and f^. 


of explanatory variables. In comparison, the fit improves slightly by dropping Xruo- 
We further reduce the model to a single explanatory variable, fsi^x) = co + CiXi, 
with i G {RLG, RLV, RUJ, RUO}, and examine the four possibilities in Rows 9-16 
of Table 12. We hnd that Ra,Adj deteriorates, but only moderately for the model 
Co + crlg-^rlg- This simple model captures much of the variability in the data set. 
A somewhat poorer fit is achieved by Xrlv, which is illustrated in Figure 15, for a = 
0.90. That hgure also depicts the corresponding quantile and least squares regression 
lines. It is apparent that superquantile regression provides a distinct perspective from 
the other regression techniques of potentially signihcant value to a decision maker. 


E. U.S. NAVY HELICOPTER PILOTS DATA 

This example considers the results of an online survey of winged Naval heli¬ 
copter pilots of the U.S. Navy; see Phillips (2011) for details. Her goal is to verify 
if helicopter pilots back pain is a concern among the helicopter community and to 
define this problem’s implications. Although this is an important and real issue in 
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the helicopter community, we do not use the superquantile regression technique de¬ 
veloped in this dissertation to estimate helicopter pilots’ back pain frequency due to 
the categorical nature of this random variable. Instead we utilize the available data 
set to estimate the total flight hours Y for naval helicopter pilots. As explanatory 
variables we have the number of years a helicopter pilot has flown for the U.S. Navy 
(Xyears), and their body mass index (Xbmi), available through a formula derived using 
the available data on height and weight of helicopter pilots. 

Since we only consider those pilots that answered questions in the “Demo¬ 
graphics” and “Flight Hour Info” sections, see Appendix A in Phillips (2011), of the 
648 pilots that completed the survey, we only use 633 observations. Figure 16 displays 
these observations in a pairwise scatterplot matrix. As expected, one clearly depicts 
the linear correlation between years an helicopter pilot has flown for the U.S. Navy 
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and the estimated total number of flight hours. 

We hrst consider a regression function of the form f{x) = cq + Cyears^^years + 
CBMia^BMi and vary the probability levels a. Rows 2-10 in Table 13 report the obtained 


Model 

Regression 

a 

Co 

Cyears 

Cbmi 


p2 

^a,Ad] 


Least Squares 

NA 

51.70 

161.22 

0.9176 

0.7780 

0.7773 



0.25 

-48.71 

146.67 

0.3418 

— 

— 

/i 

Quantile 

0.50 

-55.56 

177.78 

0.0000 

— 

— 


0.75 

0.0000 

200.00 

0.0000 

— 

— 



0.99 

1233.3 

322.49 

-46.565 

— 

— 



0.25 

-47.03 

200.00 

0.000 

0.6094 

0.6081 


Superquantile 

0.50 

2.1827 

208.69 

-0.1809 

0.6205 

0.6193 


0.75 

116.71 

223.21 

-2.6097 

0.6147 

0.6134 



0.99 

244.33 

323.79 

-75.903 

0.4754 

0.4738 


Least Squares 

NA 

74.84 

161.30 

— 

0.7780 

0.7776 



0.25 

-40.00 

146.67 

— 

— 

— 

/2 

Quantile 

0.50 

-55.56 

177.78 

— 

— 

— 

0.75 

0.0000 

200.00 

— 

— 

— 



0.99 

-93.75 

343.75 

— 

— 

— 



0.25 

-47.03 

200.00 

— 

0.6094 

0.6088 


Superquantile 

0.50 

-1.781 

208.57 

— 

0.6205 

0.6199 


0.75 

49.721 

223.13 

— 

0.6146 

0.6140 



0.99 

247.55 

350.00 

— 

0.4538 

0.4529 


Table 13. Example E: Regression vectors, R^, and R^^Adj model fi{x) = cq + 

Cyearsa^years + CBMia^BMi and f2(x) = Cq + Cyears^^years at Varying probability levels a. 


solution vectors for model /i, the corresponding coefficients of determination R^, 
and adjusted coefficients of determination R^ Adj- The hts are reasonable but the 
p-value for cbmi from the least squares regression suggests the possible beneht of 
dropping Xbmi as explanatory variable. With this in mind, we drop the explanatory 
random variable Xbmi from our new model. Before we move on to the next model, we 
notice that the obtained 0.99-quantile regression solution vector is correct although 
its intercept looks way larger compared to other approximate solution vectors. 

Second we consider a single-variable model of the form f 2 {x) = Co + Cyearsa^years, 































and obtain the results presented in Rows 11-19 of the same Table 13. The adjusted 
coefficients of determination Ra,Adj slightly increase in the cases of least squares and 
superquantile regressions techniques, where a G {0.25, 0.50, 0.75}. Figure 17(a) shows 
the corresponding regression lines for the linear model / 2 , at a hxed probability level 
a = 0.50. It is interesting to notice that the quantile regression line for a = 0.50 has 
a negative intercept, while the least squares and superquantile regression functions 
intercept the y-axis at higher values. Another aspect we learn from Figure 17(a) is 
the importance of the magnitude of errors in regression. This is evident when we 
compare both quantile and superquantile regression lines. Superquantile regression 
responds to the observations that have larger errors, emphasizing those observations 
that we might consider outliers. 

Both least squares and quantile regression functions cross each other at Xyears = 
7.91 years. The observation (xyears,!/) = (2000,4) shifts the least squares regression 
line upwards for smaller values of Xyears, while the large number of helicopter pilots 
with 3 and 4 years flying for the U.S. Navy with low total flight hours shifts the 
quantile regression model downwards. 

In Figure 17(b), we see the least squares regression model and the superquan¬ 
tile regression functions for probability levels a G {0.25,0.50,0.75,0.99,0.999}. We 
notice that the superquantile regression models for a = 0.99 and a = 0.999 have 
higher slopes when compared to the remainder of superquantile regression lines. Even 
the difference in slopes established by the small increase of 0.009 in a provides us the 
conclusion that deciding which probability level to use in an analysis is a hard process. 
Since obtaining these superquantile regression models is not too costly, we consider 
important to include several choices of probability levels a in any analysis. 

From this example we conclude that superquantile regression helps analysts 
address important questions such as level and trends of the average 1% highest total 
flight hours (in the case of a = 0.99, in Figure 17(b)), understand if deployment rules 
should be reviewed, and if these cases should be analyzed before reassigning them for 
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(a) Regression lines for model f 2 {x) = Cq + Cyears^ears- 



(b) Least squares and superquantile regression functions for model 
/2 = Co + Cyears^ears, for G {0.25,0.50,0.75,0.99,0.999}. 


Figure 17. Example E: Superquantile regression applied to the U.S. Navy helicopter 
pilots data. 
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future deployments. 


F. PORTUGUESE SUBMARINERS EFFORT INDEX 

The next example is based on a data set provided by the Portuguese Navy 
Submarine Squadron. We seek to estimate the random variable Y that represents 
the effort index of the Portuguese submariners. This index was created as a decision 
tool to support human resource management inside the Submarine Squadron. Once a 
sailor becomes a submariner, his career depends mainly on the Submarine Squadron. 
The Commanding Officer of the Submarine Squadron has the power of assigning a 
submariner for a mission, if there is the need to embark an extra element or substitute 
someone onboard. It is crucial to support such decisions with a tool that emphasizes 
who is more “available” for the mission. 

The idea behind this index is to build in the near future a prototype for 
submariners careers which helps determine selection criteria for future Submarine 
Squadron personnel recruitment and also understand who has been overemployed. 

In the data set, we have 103 observations with hve possible explanatory vari¬ 
ables: years since a submariner has gained the insignia of the Portuguese submarine 
service (Xdoiphins), years a submariner has embarked on surface warships (Xsurf), years 
a submariner has been ashore (Washore), total submarine navigation hours (Xsub), and 
submariners age (Wage). 

Naturally one thinks that age is an important factor that needs to be taken 
into consideration. The idea of older submariners having more experience due to 
more training has not always been true, and that issue raised the question of how to 
quantify training and expertise. Figure 18(a) shows that although age is important, 
it does not directly translate the effort of a submariner. For example, a 39-year old 
submariner can have an effort index as low as 5 or as high as 22. Such discrepancies 
cause discomfort among fellow submariners. 

Another factor that is also relevant for designing a prototype for submariners 
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30 35 40 45 


Submariners Age 

(a) Effort index versus submariners ages. 



0 5 10 15 20 

Years with Dolphins 

(b) Effort index versus years submariners have the dolphins. 


Figure 18. Example F; Portuguese submariners effort index against their ages and 
years they have the submariners insignia. 
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careers is the number of years a submariner has the insignia of the Portuguese subma¬ 
rine service. Analogously to age, one thinks that the larger this number, the higher 
the effort index. Figure 18(b) shows how the effort index behaves with the number of 
years a submariner has the dolphins, and we realize there is an increasing variability 
among these observations. 

For now we consider higher effort indices to be more detrimental than small 
indices for the completion of the Submarine Squadron mission, i.e., overemploying is 
considered worse than underemploying a submariner. 

One of the goals with this example is to show that superquantile regression 
helps us better visualize what may cause the discrepancies in effort indices among 
submariners. 

We next observe the possible correlations between variables in the data set. In 
Figure 19, we have the scatterplot matrix of the data set for some of the explanatory 
random variables, X^oiphins, and Xashore, against the effort index V. Here we 

can observe a linear correlation between the number of years a submariner has the 
dolphins and the effort index. Since the total submarine navigation hours Xhours is 
a factor considered in the computation of the effort index, their correlation is very 
high and we do not include this variable in the scatterplot or later in the analysis. 
We explore several possible models and compare the obtained solution vectors and 
coefficient of determination results for further analysis. 

In Figure 20, we plot the submariners ages against the number of years a 
submariner has the insignia of the Portuguese submarine service. A small detail 
that we encounter here is the lack of observations for values of Xdoiphins between 4 
and 7 years. This lack of observations is due to fact that Portugal acquired the 
Tridente-class submarines in 2010, and the few years prior were dedicated to training 
the existing submariners to a completely new technology. This process required the 
Portuguese Navy to delay the submariners course until after the reception of the new 
assets. 
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Figure 19. Example F: Portuguese submariners effort index scatterplot matrix. 
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Figure 20. Example F: Submariners ages against the number of years they have the 
submariners insignia. 

We hrst start with a linear model of the form fi{x) = Cq + CiXi, with i G 
{dolphins, surf, ashore, age}, i.e., we only include one explanatory variable at a time. 
We then consider a linear model / 2 Cdoiphinsa^doiphins + Cage^^age- Table 14 presents the 
obtained solution vectors and the corresponding coefficients of determination, for a 
probability level a = 0.75. The years the submariners embark in surface warships 
and the number of years they spend ashore between embarks are two explanatory 
variables that we discard from this point on, because they both play a very negligible 
role, as determined by even though they might be important in conjunction 

with other explanatory random variables. Rows 6, 11, and 16 of Table 14 report 
the regression vectors for model f 2 - We realize that the coefficient of determination 
improves in these cases. 
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Regression 

Co 

^dolphins 

^surf 

^ashore 

^age 

]d2 

-^0.75 


3.1365 

0.8643 

— 

— 

— 

0.7452 


11.9111 

— 

-0.4448 

— 

— 

0.0182 

Least Squares 

8.3782 

— 

— 

0.7491 

— 

0.2314 


-17.6369 

— 

— 

— 

0.7983 

0.4845 


8.1218 

0.9918 

— 

— 

-0.1711 

0.7512 


2.8690 

1.0878 

— 

— 

— 

— 


15.8063 

— 

-0.6190 

— 

— 

— 

Quantile 

10.7798 

— 

— 

0.7945 

— 

— 


-19.2554 

— 

— 

— 

0.9084 

— 


11.7357 

1.3037 

— 

— 

-0.3065 

— 


2.9811 

1.2172 

— 

— 

— 

0.5866 


17.6450 

— 

0.3456 

— 

— 

0.0038 

Superquantile 

15.4621 

— 

— 

0.5666 

— 

0.0212 


-27.0234 

— 

— 

— 

1.2048 

0.2403 


7.3697 

1.3430 

— 

— 

-0.1558 

0.5939 


Table 14. Example F: Regression vectors and for linear models fi and / 2 , at a 
fixed probability level a = 0.75. 

In Figure 21, we plot the linear model fi{x) = cq + Cdoiphinsa^doiphins for least 
squares, 0.60-quantile and 0.60-superquantile regressions. All three obtained regres¬ 
sion functions have completely distinct slopes. The blue line representing the quantile 
regression gives us the notion of where the 40% worst cases are, while the green line 
representing the superquantile regression model provides us the average of these worst 
indices. 

As stated at the beginning of this example, the orientation of the problem is 
such that higher effort indices are worse. However and as illustration we believe it is 
very beneficial to look at the cases where the submariners effort is low and therefore we 
flip the orientation of this problem for the next hgure in order to highlight those cases 
that should also be taken into consideration. This is a good example where using one 
of both orientations in solving the problem is possible depending on where the major 
concern lies. Figure 22 shows the least squares regression model and the 0.75-quantile 
and 0.75-superquantile regression functions. We add the 0.25-quantile regression £t 
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Figure 21. Example F: Regression lines for model fi{x) = cq + Cdoiphinsa^doiphins- 

and the new 0.75-superquantile regression function, after flipping the orientation of 
the problem and solving for the new superquantile regression problem, marked with an 
asterisk in the legend, and displayed in Figure 22 by a dashed green line. This dashed 
line has the same meaning as the full green line but for the lowest 25% presented effort 
indices. We consider that taking care of both ends of the spectrum will expedite the 
process of smoothing the submariners career, but this is not fully pursued here. We 
hnish using the linear model fi{x) = Cq + Cdoiphins^^doiphms by showing Figure 23, 
where clearly the 0.25-superquantile regression model is completely different of the 
0.75-superquantile* regression model. 

Second, we consider a quadratic model of the form fs^x) = cq -|- Cage^^age + 
Cage22^age- Table 15 shows the obtained regression vectors and the corresponding co- 
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Figure 22. Example F: Least squares, quantile and superquantile regression func¬ 
tions for linear model /i. An asterisk indicates that the 0.75-superquantile regression 
function was obtained after reversing the orientation of the original problem. 


Regression 

Co 

Cage 

Cage2 

td2 

-^0.75 

Least Squares 

-87.1182 

4.6498 

-0.05251 

0.5442 

Quantile 

-97.2181 

5.2652 

-0.0600 

— 

Superquantile 

-126.4859 

6.9812 

-0.0827 

0.3235 


Table 15. Example F: Regression vectors and R^, for quadratic model fsi^x) = cq + 
Cage^^age + Cage2a:age, with a = 0.75. 
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Figure 23. Example F: Different a-superquantile regression functions for linear model 
/i. An asterisk indicates that the 0.75-superquantile regression function was obtained 
after reversing the orientation of the original problem. 

efficients of determination, which are larger than those obtained using the linear 
model. We plot these quadratic models in Figure 24(a), and notice that the 0.75- 
superquantile regression function captures the effects of the higher effort indices and 
forms an interesting curvature. To the contrary, the 0.75-quantile regression model 
does not seem to be affected by such observations and it looks almost parallel to the 
least squares regression model for a 40-year old submariner. With these comments 
in mind, we need a different validation analysis tool that helps us understand which 
observations, if any exist, should be carefully checked for their validity, or should pos¬ 
sibly be seen as outliers. Before we hnish this example and as seen in Section II.C, we 
utilize the Cook’s distance concept hrst applied to the case of least squares regression, 
then to the case of superquantile regression. Since there is more than one possible 
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(a) Effort index versus submariners ages. 



(b) High leverage observations for quadratic model /a. 

Figure 24. Example F: Quadratic regression models /a at probability level a = 0.75. 
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cut-off value for such Cook’s distances, we resort to the commonly used formula 4/z/ 
in both cases, least squares and 0.75-superquantile regression, where we have v = 103 
observations in this example. Figure 25(a) shows the Cook’s distances for the least 
squares quadratic model, while Figure 25(b) the Cook’s distances for the superquan¬ 
tile regression technique. We clearly see that observations number 24, 40, 49, and 80, 
emphasized by the red dots in Figure 24(b) and 25(a), are considered high leverage 
observations for least squares regression. In the context of superquantile regression, 
we see that observations number 1, 19, 24, 27, 28, 31, and 32, emphasized by the green 
dots in Figure 24(b) and 25(b), are considered high leverage observations. Curiously, 
in our example only observation 24 is coincidently considered high leverage for both 
regression techniques; plotted in orange in Figure 24(b). Another interesting detail 
consists on where the high leverage observations are located in this same plot. We 
realize that these observations drive the superquantile regression £t downwards since 
they influence the 0.75-quantile regression function, and consequently also the result¬ 
ing 0.75-superquantile regression function as an average of all observations above the 
quantile regression fit. 

Further analysis should be done for this example but it goes beyond the scope 
of this dissertation. However we conclude that superquantile regression is an impor¬ 
tant analysis tool that when used wisely gives the decision maker powerful information 
on the upper tail of the random variable of concern. With the Cook’s distance con¬ 
cept applied to the superquantile regression, we also identify those observations in 
the data set that influence the obtained £t and that should be checked for validity. 

G. UNCERTAINTY QUANTIFICATION 

The next example arises in uncertainty quantification of a rectangular cross 
section of a short structural column, with depth d and width w, under uncertain yield 
stress and uncertain loads; see Eldred et ah (2011). Assuming an elastic-perfectly 
plastic material, a limit-state function that quantifies a relationship between loads 
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(a) Cook’s distances for least squares regression. 
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(b) Cook’s distances for superquantile regression. 

Figure 25. Example F: Cook’s distances for least squares and superquantile regression 
fits using quadratic model fsi^x) = cq + Cage^^age + Cage22:age) at a = 0.75. 
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and capacity is described by the random variable 




X| 


(IV.l) 


wd^Xs w‘^d?Xl' 

where the bending moment load Xi and the axial load X 2 are normally distributed 
with mean 2,000 and standard deviation 400, and mean 500 and standard deviation 
100 , respectively, and the material’s yield stress X 3 , is lognormally distributed with 
parameters 5 and 0.5, with Xi, X 2 , and X 3 independent. We observe that the second 
term in (IV.l) is the ratio of moment load to the column’s moment capacity, and 
the third term is the square of the ratio of the axial load to the axial capacity. The 
constant —1 is introduced for the sake of a translation such that positive realizations 
of Y represent “failure” and negative ones correspond to a situation where load effects 
remain within the capacity of the column. (We note that the orientation of the limit- 
state function is switched compared to that of Eldred et ah (2011) for consistency 
with our focus on “losses” instead of “gains.”) We set the width w = 3, and the 
depth d = 12 . 


Model 

a 

Co 

lO^ci 

IOV 2 

IOV 3 

Rl 


0.999 

-0.6797 

0.0156 

7.9000 

-9.1100 

0.154 


0.99 

-0.8084 

0.0150 

3.8000 

-8.2700 

0.190 

/i 

0.9 

-0.8579 

0.0107 

1.5900 

-7.7000 

0.260 


0.75 

-0.8705 

0.0090 

1.0800 

-7.5900 

0.301 


LS 

-0.8827 

0.0070 

0.5921 

-7.7180 

0.571 


0.999 

-1.0457 

1.8640 

0.0300 

— 

0.902 


0.99 

-1.0450 

1.6182 

0.0400 

— 

0.891 

/2 

0.9 

-1.0308 

1.3393 

0.0200 

— 

0.894 


0.75 

-1.0261 

1.2595 

0.0200 

— 

0.893 


LS 

-1.0179 

1.1315 

0.0056 

— 

0.979 


Table 16. Example G: Approximate regression vectors and coefficients of determina¬ 
tion for superquantile regression with varying a and least squares (LS) regression. 


We seek to quantify the “uncertainty” in Y by surrogate estimation. Of course, 
in this case, this is hardly necessary; direct use of (IV.l) suffices. However, in practice, 
an analytic expression for a limit-state function, as in (IV.l), is rarely available. One 
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then proceeds with determining a regression function / ; ]R^ —)■ ]R, based on a sample 
of input-output realizations, such that f{X), with X = {Xi, X 21 X^), approximates 
Y in some sense. To mimic this situation, we consider a sample of size 50000 drawn 
independently from X, the corresponding realizations of Y according to (IV. 1), and 
two forms of the regression function. The hrst model is linear and takes the form 

fl{x) = Co CiXi C 2 X 2 + 03 X 3 

and the second one utilizes basis functions hi{x) = xi/x^, and h 2 {x) = {X 2 /X 3 Y and 
is of the form 

f2{x) = Co + CiXi/Xs C2x\lx\. 

In view of (IV. 1), we expect /i to be unable to capture interaction effects between 
variables and its explanatory power may be limited. In contrast, /2 uses the correct 
basis functions, but even then f 2 {X) may deviate from Y due to the hnite sample size 
used to determine the regression vector. Table 16 confirms this intuition by showing 
approximate regression vectors for both models over a range of probability levels a 
as well as for the least squares (LS) regression. The vectors are obtained in less than 
15 seconds by solving with z/ = 50000, p = 1000, and Simpson’s rule. The last 

column of Table 16 shows R^, which is low for fi and high for /2 as expected. 

In uncertainty quantihcation and elsewhere, surrogate estimates such as fi{X) 
and f 2 {X) are important inputs to further analysis and simulation. Table 17 illus¬ 
trates the quality of these surrogate estimates in this regard by showing various 
statistics of fi{X) and / 2 (X) as compared to those of Y. Row 2, Columns 3-10 show 
estimated mean, standard deviation, superquantiles at 0.75, 0.9, 0.99, 0.999, probabil¬ 
ity of failure, and buffered probability of failure (see (II.5)) of V, respectively, using a 
sample size of 10^ and standard estimators. Coefficients of variation for these estima¬ 
tors are ranging, approximately, from 10“^ for the mean to 0.02 for the probability of 
failure. Rows 3-6 of Table 17 show similar results, using the same sample, for fi{X), 
with a = 0.999, 0.99, 0.9, and 0.75, respectively. We notice that as the probability 
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level a increases, fi{X) becomes increasingly conservative. In fact, for a = 0.999, 
fi{X) is conservative in all statistics. Superquantile regression with smaller proba¬ 
bility level a fails to be conservative for some “upper-tail” statistics. Interestingly, 
/i(X) based on a is conservative for all superquantiles up to and including in these 
tests. These observations indicate that in surrogate estimation the probability level a 
should be selected in accordance with the superquantile statistic of interest. We can 
then expect to obtain conservative estimates even for relatively poor surrogates. Row 
7 of Table 17 gives corresponding results for fi{X) under the least squares regression 
£t. While this fit provides an accurate estimate of the mean (see Column 3), the 
upper-tail behavior is represented in a nonconservative manner. 

Rows 8-12 of Table 17 show comparable results to those above, but for the 
/ 2 (X) models. As also indicated in Table 16, f 2 {X) is a much better surrogate 
of Y than fi{X) and essentially all quantities improve in accuracy. For example, 
f 2 {X) based on superquantile regression overestimates the buffered failure probabil¬ 
ity only moderately with a = 0.999, 0.99, and 0.9, and slightly underestimate with 
a = 0.75; see the last column of Table 17. In contrast, least squares regression un¬ 
derestimates the buffered failure probability substantially even for this supposedly 
“accurate” model. Of course, least squares regression centers on conditional expecta¬ 
tions and as basis for estimating tail behavior may hide potentially dangerous risks. 


116 




CMIOOOOOCOCDCMOIO 
LO 03 ^ O CO 03 

C3 03 t- CNj CO 00 

^ CO 03 ^ ^ CZ3 CZ3 


10000000003CNI'^03 
1^ CO ^ o o 03 CO 

Lo 00 03 ^ lo 

CO ^ ^ CO ^ 

03 00 CO ^ 03 03 03 


03coo)03cocoj0)i0'^ 

co'^oo^Loooi^ioo 


r >0 OC 

oS'^ooi^oo03coco,.,. 

g^^-cfl003^^^03^ 

Q Q o o o o ■ • • 


o o o 


ioo3'0)i^coco^03^ 

Od^Ot^iO^COCClCO 

CMCM^oo^oocNioo 

OdiOCOt^O^CCICOCO 

030303030303030303 


^coC 000 ^ 03 ^ 0000 LO 

^^cocM^cocoo'^t-co 

cogcsiocoi^cq'^LOiocD 

03H00303030303030303 


co^ocoioco^cNiooi^ 

^JoCOOJ^cocoocDooo 

t^oCOCOCOl^iOCDCDCDt^ 

03o00303030303030303 


1^^031000'^CDC003 

cooi^cocoocDooo 

COCOCOl^iOCDCDCDt^ 


cot^t^oot^ooi^^ot^coo 
03 03o;io;it-'^'^cooo^o 
03C<10001^1^'0)'^^^0 

iOi T-—I 1—I ‘O’ ‘O’ '—I '—I '—I '—I 

‘ 03 ‘ 03 ‘ 03 ‘ 03 ‘ 03 ‘ 03 ‘ 03 < 03 ‘ 03 < 03<03 


'0)03100^03^<03C01^^ 

coioi^^^co^i^'ojcoio 

'^tMiOO3O)'^O)O3CN|C0'‘:f 

oo^'^'oji^ooi^i^oooooo 

‘03‘03‘03‘03‘03‘03‘03‘03‘03‘03<03 




Q o ^ o 


2 o ° o ^ 



117 



H. SUPERQUANTILE TRACKING 

To finish Chapter IV, we return to the hrst example in Section IV.A, and 
consider a loss random variable 


V = Xi + Xae, 

where e is a standard normal random variable and X = {Xi,X 2 ) is uniformly dis¬ 
tributed on [—1,1] X [0,1], with e,Xi, and X 2 independent. We consider a regression 
function of the form f{x) = cq + ciXi + C 2 X 2 and set a = 0.90. 

We examine conditional values of Y given realizations of X = {Xi,X 2 ), i.e., 
superquantile tracking. For x = {xi,X 2 ), Y{x) = Y\X = a; is normally distributed 
with mean xi and variance Consequently, it is straightforward to compute that 
qo,g(Y{x)) = Xi + 1.7550x2. Table 2 shows vectors that only track go.9(^(')) approx¬ 
imately, as Co, Cl, and C 2 deviate from 0, 1, and 1.755, respectively. In fact, there is 
in general no guarantee that every regression function / will satisfy /(x) = qa(Y{x)) 
for all X, even for large sample sizes. As indicated by Proposition II.5, however, a 
superquantile of Y (x) can be estimated by approximating a degenerate distribution 
of (X, V) at X. 


X range: 

' 1' 

X 

[0.45,0.55]2 

[0.495, 0.505]2 

Cq 0.5Ci “h 0.5 c2 

(1.349, 1.575) 

(1.329, 1.475) 

(1.330, 1.473) 

Cq 

(0.029, 0.123) 

(-2.414, 1.784) 

(-23.715, 18.329) 

Cl 

(0.971, 1.075) 

(-0.229, 3.597) 

(-11.063, 25.656) 

C2 

(1.523, 1.975) 

(-1.686, 5.186) 

(-33.916, 35.701) 


Table 18. Example H; Approximate 95% conhdence intervals when tracking qo,g{Y (■)) 
near x = (0.5,0.5) using shrinking sampling ranges for X. The correct value 
go.9(h'((0.5, 0.5))) = 1.378. 

Table 18 shows such “local” estimates of go.9(^ ( 2 ^)) near x = (0.5, 0.5). Specih- 
cally, using v = 500 we compute cq, ci, and C 2 by solving as above, with X sampled 
uniformly from [—1,1] x [0,1]. We repeat these calculations 10 times with independent 
samples and obtain the aggregated statistics of Column 2 of Table 18. The second row 
gives an approximate 95% conhdence interval for the mean value of cq -|- 0.5ci -|- 0.5c2 
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across the 10 meta-replications. The interval contains go.9(^((0.5, 0.5))) = 1.3775, 
but is somewhat wide. Proposition II.5 indicates that sampling from a smaller set 
[0.45,0.55] X [0.45,0.55] will tend to improve the estimate of go.9(d^((0.5, 0.5))). Col¬ 
umn 3 of Table 18 illustrates this effect, by showing results comparable to those of 
Column 2 and Row 2, but for the smaller interval. As expected, the conhdence in¬ 
terval for Co -|- 0.5ci -|- 0.5 c2 narrows around the correct value. The last column shows 
similar results, but now for sampling of X uniformly on [0.495, 0.505] x [0.495, 0.505]. 
The estimate of go.9(^((0.5, 0.5))) improves only marginally, with the residual uncer¬ 
tainty being due to the inherent variability in the (relatively small) samples. The 
narrow sampling interval causes the last estimate to be similar to that obtained by 
the standard empirical estimate from 500 realizations of R((0.5, 0.5)), which yields 
the conhdence interval (1.312,1.462). 

While sampling on smaller sets gives better local estimates of qo.giXix)), the 
global picture deteriorates. The last three rows of Table 18 show corresponding 
approximate 95% conhdence intervals for cq, Ci, and C 2 , respectively. While Co-t-CiXi-l- 
C 2 X 2 generated by the set [—1,1] x [0,1] provides a reasonably good global picture 
of qo_g(Y{x)), the smaller sets lose that quality as seen from the wide conhdence 
intervals. In view of the above results, we see that an analyst that can choose “design 
points,” i.e., points x at which to sample Y{x), should balance the need for accurate 
local estimates with that of global estimates. In fact, even if the primary focus is 
on estimating qa(Y{x)) for a given x, as we see in this example, it may be equally 
ehective to spread the samples of X near x instead of exactly at x, and then obtain 
some global information about qa{Y{-)) too. Our methodology provides a hexible 
framework for estimating qa{Y (a^)) even if there is only a small number of realization 
of Y[x), or even none, available. The estimates are based on realizations of Y[x') for 
x' near x. 

In the next chapter we discuss the conclusions taken from our research and 
suggest possible future work. 
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V. SUMMARY, CONCLUSIONS, AND 

FUTURE WORK 

A. SUMMARY AND CONCLUSIONS 

In this dissertation, we develop a novel regression framework, superquantile 
regression, that naturally extends least squares and quantile regressions to contexts 
where the decision maker is risk averse and is simultaneously concerned about the 
magnitude of the obtained regression errors. As opposed to squaring these errors or 
by looking at their signs, this framework for superquantile regression weights larger 
errors increasingly heavily in a way consistent with a coherent and averse risk mea¬ 
sure, the superquantile risk measure. We use superquantiles directly in the regression 
model and go beyond other generalized regression techniques that approximate condi¬ 
tional superquantiles by various combinations of conditional quantiles, with the only 
required assumption that the involved random variables have hnite second moment. 

We utilize the “Fundamental Risk Quadrangle” concept and the connections 
established therein between distinct measures of a random variable whose orientation 
is such that upper-tail realizations are unfortunate and low realizations are favorable. 
We rely on the superquantile-based risk quadrangle and the corresponding relations 
between measures of deviation, risk, and error applied to the superquantile as the 
statistic to obtain superquantile regression functions as optimal solutions of an error 
minimization problem. 

Then we develop the fundamental theory for superquantile regression by dehn- 
ing its regression problem as an error minimization problem. We examine existence 
and uniqueness of the obtained regression functions, and we establish a guaranteed 
unique regression vector in the cases where the loss random variable and the chosen 
basis functions are normally distributed with a positive dehnite variance-covariance 
matrix. Next we analyze consistency and stability of the regression functions under 
perturbations due to possible measurement errors and approximating empirical distri- 
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butions generated by samples of the underlying data. We formulate a deviation-based 
superquantile regression problem as an equivalent minimization problem of a corre¬ 
sponding measure of deviation taken from the superquantile-based risk quadrangle. 
This new minimization problem implies computational advantages since it reduces the 
number of variables and includes a simpler objective function. We also provide rate 
of convergence results under mild assumptions that allow us to use an approximate 
superquantile regression problem, based on a sample of the true distribution. 

Since any regression framework must be associated with means of assessing 
the goodness of £t of a computed regression vector, we define three validation analy¬ 
sis tools for quantile and superquantile regressions: the coefficient of determination, 
the adjusted coefficient of determination, and Cook’s distance. We first analyze the 
formulas for these three validation analysis tools when applied to least squares re¬ 
gression, and translate them into measures of error and deviation in the sense of the 
mean-based quadrangle. We conclude that these three definitions can be formulated 
for any generalized regression consisting of minimizing an error random variable. 

Concerning computational methods for solving superquantile regression prob¬ 
lems, we develop two distinct classes: the primal methods where one solves the su¬ 
perquantile regression problem by means of analytical and numerical integration tech¬ 
niques, and the dual methods where one utilizes the dualization of risk as part of the 
objective function of the new regression problem that we apply to discrete cases. 

In terms of complexity, our results indicate that the dual methods outperform 
the primal methods in most of the cases, especially for large sample sizes. We com¬ 
pare computational methods by presenting their runtimes and realize that using dual 
methods is a quite fast process and in fact, for reasonable sample sizes, is not much 
slower than least squares regression. While the primal method with analytical inte¬ 
gration retrieves the exact solutions, it takes too long to run and requires too much 
memory for sample sizes larger than 1000 observations. 

Our results show that superquantile regression is computationally tractable. 
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offers new insight about the upper-tail-behavior for quantities of interest, and provides 
a complementary tool for risk-averse decision makers. 

B. FUTURE WORK 

Similarly to what is done for quantile regression, future work could extend 
statistical inference and predictive analysis applied to superquantile regression. Also 
one could further research model validation analysis tools, and address significance 
testing for superquantile regression. Much research also remains to be done on su¬ 
perquantile tracking. Furthermore, one could build on an i?-package to implement 
superquantile regression and the respective supporting documentation. 
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